
Employee
•
15 Messages
•
920 Points
IMDb Data – Now easily available to contributors
Today (20 Dec 2018) we are pleased to announce the IMDb datasets are easier to access and now directly from imdb.com. Using the new interface, contributors can bulk-access subsets of IMDb title and name data for personal and non-commercial use. Each dataset file is in a gzipped, tab-separated-values (TSV) format.
To access the datasets and for more information you can go here: https://contribute.imdb.com/czone
Stewart

To access the datasets and for more information you can go here: https://contribute.imdb.com/czone
Stewart
jeorj_euler
8.2K Messages
•
186.1K Points
4 y ago
1
0
phil_g
224 Messages
•
11.4K Points
4 y ago
Are there still plans to keep Col's promise to make more data available, preferably taking into account all the concerns raised in that thread?
8
phil_g
224 Messages
•
11.4K Points
3 y ago
I'd like answers to the questions I asked over 2 weeks ago, please.
What are the requirements for accessing the mysterious 'extended datasets'? Exactly how many contributions are there in 'a large number'? Exactly how many days/months/years ago counts as 'recently'?
But I suspect that whatever your answer, I'll be asking you to reconsider your policy on this. For my own case, I have contributed over 45,000 items to IMDb over the last 5 or 6 years (according to Col's end-of-year reports). I realise that's nowhere near as much as some, but I still consider it to be a very large number, and it was enough to get me into the top 250 contributors list three times in recent years. So I'm disappointed that you seem to be demanding more, before I can access data that wouldn't even exist without contributors like me. I'm sure I'm not the only person in this situation. For all the flaws in the old ftp-based system, at least the data was there and freely available. Why are you now so reluctant to share?
Also, why are you being so secretive about this data? I still have no idea what might be included in these 'extended datasets' and in fact I wouldn't even know that such a thing existed at all if I hadn't asked here. Neither the announcement in this thread nor the datasets page itself makes any mention whatsoever of what might be available or what mystical incantation is required to unlock it. Why is that?
6
phil_g
224 Messages
•
11.4K Points
3 y ago
If anyone's interested, in the meantime my recent contributions seem to have triggered the magic formula and unlocked the extended datasets for me. Looks very promising at first glance, so I should probably be thanking you for that. But, sadly, the discussion in this thread now leaves me wondering if I'll be able to find enough time to look at the data in detail before my recent contributions are no longer recent enough and I lose access again.
10
stewt
Employee
•
15 Messages
•
920 Points
3 y ago
The Extended Datasets are available to those who have 1000+ approved contributions in the last 360 days, otherwise the Basic Datasets are available with just one approved contribution in the last 60 days.
I hope this helps.
Stewart
3
ju_5706153
7 Messages
•
362 Points
3 y ago
I see no way to qualify as I have a wide range of interests and I'm not from the industry so getting 1000 updates in 360 days will be never possible (I only made maybe 5 updates in the last 15 years and pointed to serveral issues with broken/incomplete exports of the LIST files via the normal support pages for ~19 years).
The Java Movie Database (JMDB) application...
What should make me qualify to access the data is the fact that I'm the author of the Java Movie Database (JMDB) which is available since 19 years (first versions only to a limited set of beta testers). --> http://www.jmdb.de/
Contact details can be found on the above website.
There is no other free application I know of processing the (public) IMDb data for that long. The only thing coming close is the IMDbPY project.
While the application was originally created by two persons I'm the only one left (since 15 years).
The JMDB application allows to import/process the old LIST files (which is still the base for the search inside the application) plus the TSV files (not yet used inside the application to search).
The reason that I didn't update the code to use the TSV content inside the application beside adding support to import the data is basically because the available content is so limited, that it is actually useless.
Sorry, but there is no other (nicer) way to sum it up when it comes to the TSV file format.
This is what the people already complained about in the original thread (https://getsatisfaction.com/imdb/topics/imdb-data-now-available-in-amazon-s3).
So I would really like to see how the "complete" stuff actually looks like and if it's actually for the following reasons:
As this has been under the radar here other use of the IMDb data in the past...
I also want to share that the IMDb data has been and still is (at least the frozen LIST files) used together with JMDB:
- by students/postgraduates (postdocs) from universities for data analysis, etc. (also in master thesis) as can be seen e.g. here (there are more like TU Berlin, Germany): UiO University Oslo Norway - https://www.uio.no/studier/emner/matnat/ifi/INF3100/v16/undervisningsmateriale/filmdatabasen-og-post... and
- it is used in computer science teaching lessons at schools as can be seen here: https://www.swisseduc.ch/informatik/120-lektionen/principles/recollection/sql-imdb.html
The imported IMDb data has been used to create covers or other internal notes on titles that have been recorded from TVFinally some technical issues with the format...
There are also some technical issues I have starting with the fact that inside each of the *.tsv.gz archives the name of the file is "data.tsv".
Normally the filename of the compressed file should be equal to the file found inside the archive minus the ".gz" compressor extension (Example: "title.crew.tsv.gz" --> "title.crew.tsv").
It's broken since the beginning. I you extract all files you have downloaded into the current/same directory you end up with only one file - that was extracted from the "last" archive processed.
When the data is imported with JMDB from the compressed gzip file and you try to add foreign key constraints to the filled tables containing the data it will not work for some cases as there are references in the e.g. title.crew.tsv (person reference) that has no matching entry in the name.basics.tsv file so the relational database tells you that your data basically is incomplete. There is more of this.
Kind regards,
Juergen Ulbts
1
alex_ivanovich
3 Messages
•
104 Points
3 y ago
What are these secret data?
I remind you that the IMDB database was created over the years with the support of millions of people around the world, without them (and even a small contribution of mine) IMDB would never have been born.
It also seems absurd and insane that to get access to the extra data you have to enter 1000 contributions a month, but even if I enter a data a day I will never succeed.
Go back, and release the database to everyone, it and public assets.
Juergen is waiting for your reply, which may never come
1
a_braunsdorf
17 Messages
•
852 Points
3 y ago
2
alex_ivanovich
3 Messages
•
104 Points
3 y ago
now IMDB has become premium (soon to be paid) with the contribution of millions of users.
Stop contributing and we'll see if they come back
0
0
a_braunsdorf
17 Messages
•
852 Points
3 y ago
0
alex_ivanovich
3 Messages
•
104 Points
3 y ago
now no longer, unfortunately
0
0