Skip to main content

Thu, May 29, 2014 9:28 PM

API/Bulk Data Access

Hi!

We’re in the process of reviewing how we make our data available to the outside world with the goal of making it easier for anyone to innovate and answer interesting questions with the data. If you use our current ftp solution to get data [http://www.imdb.com/interfaces] or are thinking about it, we’d love to get your feedback on the current process for accessing data and what we could do to make it easier for you to use in the future. We have some specific questions below, but would be just as happy hearing about how you access and use IMDb data to make a better overall experience.

1. What works/doesn’t work for you with the current model?
2. Do you access entertainment data from other sources in addition to IMDb?
3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
5. Does how you plan on using the data impact how you want to have it delivered?
6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
7. Are our T&Cs easy for you to understand and follow?


Thanks for your time and feedback!

Regards,

Aaron
IMDb.com

Responses

2 Messages

 • 

72 Points

6 years ago

1. What works/doesn’t work for you with the current model?

The data is somehow structured, but the format isn't documented and is very difficult to parse. Having any structured format would be helpful, especially correctly delimited csv, since the files might be a little too big for more sophisticated formats.

2. Do you access entertainment data from other sources in addition to IMDb?

I'm using filmweb.pl data and looking into rotten tomatoes.

3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?

While I'm not sure if a single data file would be more helpful, having primary keys for each of the categories (movies, directors, actors, genres) and having them used in all the files would be very useful. Performing joins based on titles that are strings tens of characters long is inefficient and problematic. IMDB is already using integer-based id's in its urls (say tt0330243) so there shouldn't be any problem with exposing the ids.

4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?

It would compliment the current bulk data approach. An API would be good for lookup and smaller operations, but would be taxing to use for statistical/bulk purposes. You cannot replace the bulk access with an API, but on the other hand it's easy to create an API when you have access to flat files, so I wouldn't axe them.

5. Does how you plan on using the data impact how you want to have it delivered?

Yes, but different use-cases would call for very similar formats.

6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?

JSON and flat CSV files would be sufficient.

7. Are our T&Cs easy for you to understand and follow?

They aren't too easy to find.

2 Messages

 • 

120 Points

6 years ago

Hi, I have been using the flat files off and on for a long time and here are my comments:

1. What works/doesn’t work for you with the current model?
I think what you have works for the most part, I use perl and any textual data is pretty easy to read. My number 1 big, big, big, complaint is, there is no mapping of title and names to the unique IMDb id number, so there is no (supported) way to create a link back to the IMDb page. If I am creating an application, I would generally use the downloaded data to show some subset of information, but I still want to create a link back to the IMDb page. I don't want to recreate the entire IMDb page, plus there is some information, like the users own personal data (title rating, lists the title is on, etc.) that just isn't available unless you go back to the IMDb site.
Another small nit is the files are not very well documented.

2. Do you access entertainment data from other sources in addition to IMDb?
Currently using TCM. Was using the Netflix API until they shut that down.

3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
Either one. Large data set is appropriate for showing lots of data, smaller data sets is appropriate for showing small subsets of data. It depends on the application.

4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
I think an API would be more efficient. I think it is more efficient to just go query for what you want than to download blocks of data all the time when you may not use a lot of the data.

5. Does how you plan on using the data impact how you want to have it delivered?
No

6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
JSON would be ok.

7. Are our T&Cs easy for you to understand and follow?
Yes

1 Message

 • 

60 Points

6 years ago

1. What works/doesn't work for you with the current model?
  
Updating with dif files or complete reloading is a pain to say the least 
imdb title number should be the number 1 index across all tables to at least pretend the database is relational  
what does work is the availability while in my perfect world I would just be able to load a title set and query from your servers  I do understand what a load that would put on you so a local copy of data works if it is a bit cumbersome to maintain

2. Do you access entertainment data from other sources in addition to IMDb?
 
Not usually you do have the best database but tv shows could use some work and segregation of foreign films could use some work (US perspective)

3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
YES . Ease of maintenance and simplicity again that word relational!

4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?

YES . Direct access to your databases would allow me to pick and choose the titles,fields I want instead of wading through it all. For instance my personal collection has about 2600 movie and tv episodes  and 20400 albums and songs in many different formats I am after a monolithic database to store the info on those (yes I know are a ton of them out there) and populate them with the data I choose so querying for just those titles,information programmatically would make life much easier also it would probably free up a ton of  load on your end by not having scrapers doing all the dirty work that most people and aps use now just think of the 10's of  millions of searches and page loads that would go away with a simple api and I would think your actual views would be about the same like when I look up a movie I'm watching that traffic would remain constant

5. Does how you plan on using the data impact how you want to have it delivered?
YES I am a small user so occasional online access would be perfect getting relatively small data sets with an occasional larger set as in a initial population

6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?

JSON is fine but XML and text delimited would be useful too just because of the common tools I have available and for ease of hand editing if needed bearing in mind im using this data for a personal database and not to populate web pages

7. Are our T&Cs easy for you to understand and follow?

hehe it must not be I'm not sure what T&Cs are

thanks for keeping this mostly open you are my first choice when it comes to movie and tv information and have done a great job through the years

1 Message

 • 

60 Points

6 years ago

I work for QlikView and I often build my own models at home with real life data that's not boring, like bank or mine production data.  I've been thinking about building a QlikView app for some time now, and I can't explain how delighted I was to find out that the data is actually availble outside of the web site or mobile app. I'm going to start now, and will be updating you on the progress, and how I find your current data structure.

It's awesome that you are willing to share everyone's hard work and passion.

Watch this space...

1 Message

 • 

62 Points

6 years ago

Hi,

I work with a few BI tools and its fantastic this data is available.. That being said at least as a start it would be extremely helpful to delimit the data.. Its technically possibly do a lot of parsing on this structure but its not really sustainable and a simple delimitation would remove a lot of the complexity.. At least as a first iteration it would start to move this project forward! :) 

1 Message

 • 

60 Points

6 years ago

1. What works/doesn't work for you with the current model?
I'm having a hard time getting the correct data. The structure of the whole dump is kindda confusing, and the format is not easy to work with
2. Do you access entertainment data from other sources in addition to IMDb?
Movie related, no.
3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
A large CSV file, for the abbility to import the whole database would be really nice
4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
Definately, as long as there were no restrictions on connections or number of calls
5. Does how you plan on using the data impact how you want to have it delivered?
No
6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
JSON would be just fine
7. Are our T&Cs easy for you to understand and follow?
Yes

1 Message

 • 

60 Points

6 years ago

I vote for a JSON REST api

1 Message

 • 

62 Points

6 years ago

1. What works/doesn’t work for you with the current model?
It works within a quite complicated desktop database, I've been programming for over 10 years now. At some point I used most of the offline files with help of amdb, but this was rather cumbersome. Now I parse some files directly, but I quite hate it. Mostly due to the lack of the IMDb-ID. I regularly only use ratings.list at the moment. My sweet spot for the amount of movies is 50k, although I keep using the whole 3.3 million movies.list for checking. (The 50k include 500m+ votes, the whole rest has a measly 10% of that)
All in all I'm primarily happy that the current model exits at all.

2. Do you access entertainment data from other sources in addition to IMDb?
Yes, several. But unfortunately my second favourite movie site moviepilot.de discarded its API some time ago. And deep linking with an IMDb-ID stopped some time after that.

3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
YAY, primary keys. Although 200+ chars long movie titles can be primary keys too... :(
I don't know how many hours and perhaps days of my life I wasted, only because the existing text files don't contain the IMDb-ID. So more useful for sure. Downloading huge amounts of data is really a big problem for me nowadays, but I'd be happy to get only parts of the data from time to time.

4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
Depends on the abilities and restrictions of the API. If I could get the whole data for my 50k within a reasonable time (let's say a month), it could be an improvement. Since my old method to update my ratings doesn't work any more, I'd especially like the possibility to cast votes via API. I got several thousand in queue :)

5. Does how you plan on using the data impact how you want to have it delivered?
I take what I can get. Provide convenient ways to get the data and I'll think of a usage. The longer I think about it, the nicer API sounds.

6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
JSON is fine. I wouldn't mind some kind of Microsoft database files, but that's only me :)

7. Are our T&Cs easy for you to understand and follow?
Easy to understand: somewhat. Easy to follow: no :)


Is there a time frame for the possible changes?

1 Message

 • 

100 Points

6 years ago

Parsing your data is a bit prohibitive. Please move towards any standard. CSV, XML, or JSON.

My preference is a simple CSV format. Each file can contain the id and the relevant data. Let us import the data to the database that suits us. Users of different platforms can post scripts if they are required.

JSON is another option, but there is no need to get fancy. Target the lowest common denominator. Keep it simple. CSV please.

If you want to get fancier with CSV, offer a dynamic export that includes the columns required by the user.

Maybe also consider .xz compression.

For the interactive api, maybe some people would like that, but I prefer downloading the full dumps. An api would be an appropriate place for JSON.

Thanks for providing the data. I appreciate you consideration.

3 Messages

 • 

90 Points

6 years ago

aka-titles.list hasn't been updated on the ftp sites since April! This is likely a simple bug -- can you please, please check into it and get it working again? Thank you!

1. The current model has a barrier to entry but works well after overcoming that. If that's the price of data that's usable under the current licese, so be it.
2. Yes, I access other data sources.
3. You have primary keys now (the imdb-style title). Having primary keys that are the IMDB ID (e.g., tt0000000) would be nice.
4. An API would be horrible. I want to analyze the data offline -- all of the data. Bulk transfer of the whole database is the way to go.
5. Of course, how I use the data impacts how I want it delivered. I analyze ALL of it at once, so an API would not help me.
6. JSON would be fine. I'm arguing in 4 and 5 for a bulk transfer method. I don't care about the format after the bulk transfer.
7. Yes. I would love it if you could consider something more open, like one of the CC licenses.