Tuesday, July 26th, 2022 4:43 AM


IMDB Datasets no longer including some movies?

Sometime over the last two-three weeks (Between files downloaded on 2022-07-10 and 2022-07-24), it seems as if the IMDB datasets available from https://datasets.imdbws.com/ no longer include some movies.

Download https://datasets.imdbws.com/title.basics.tsv.gz for instance, and try to find the following IMDB-ids entries, there on July 10 but not on July 24:

tt0044502  Clash by Night (1952)
tt0047573  Them! (1954)
tt0048977  The Bad Seed (1956)
tt0050539  The Incredible Shrinking Man (1957)
tt0053290  Solomon and Sheba (1959)
tt0056700  The Wonderful World of the Brothers Grimm (1962)
tt0057449  The Raven (1963)
tt0060980  The Silencers (1966)
tt0065421  The AristoCats (1970)

The same IMDB-ids seem to have disappeared from https://datasets.imdbws.com/title.ratings.tsv.gz as well.

I did re-download the files on July 25 and got the same results missing.

What could explain this?

This conversation has been merged. Please refer the main conversation:

title.basics.tsv.gz is broken - https://datasets.imdbws.com/

1 year ago

The dataset is broken. It now only includes 3,477,496 titles. It should have 3 times that number almost.

The data is corrupted after the title "Kneeling for Justice: A San Francisco Memorial for George Floyd". The value in tconst for that character is "ial for George Floyd".

Could some at IMDb please correct this?

Thank you!