V

2 Messages

 • 

80 Points

Thu, Nov 25, 2021 12:01 PM

No Status

Downloadable datasets miss some data

I'm trying to use the datasets downloadable at https://www.imdb.com/interfaces/ . The problem is that some person codes contained in title.principals.tsv.gz have no match in name.basics.tsv.gz, which is supposed to associate codes with a name. For example, code nm6592173 does not appear in name.basic.tsv.gz, but it is part of a crew:

❯ zgrep nm6592173 title.principals.tsv.gz
tt3826724 6 nm6592173 actor \N ["Soldado 3"]

Anybody has a hint as to why, or whom should I contact to report this problem?

Champion

 • 

4.3K Messages

 • 

110.4K Points

5 d ago

Looks like nm6592173 has been merged into nm0739580, since 

https://www.imdb.com/name/nm0739580/

is what comes up when I put nm6592173 into the IMDb search bar.

Did you pull both files at roughly the same time?

2 Messages

 • 

80 Points

5 d ago

I downloaded them within seconds a few hours ago, and they're supposed to be refreshed every day.

I downloaded both files again. There are 3199 identifiers in the same situation. I guess one has to live with the fact that these files are not 100% correct and implement workarounds.