Skip to main content

Thu, May 29, 2014 9:28 PM

API/Bulk Data Access

Hi!

We’re in the process of reviewing how we make our data available to the outside world with the goal of making it easier for anyone to innovate and answer interesting questions with the data. If you use our current ftp solution to get data [http://www.imdb.com/interfaces] or are thinking about it, we’d love to get your feedback on the current process for accessing data and what we could do to make it easier for you to use in the future. We have some specific questions below, but would be just as happy hearing about how you access and use IMDb data to make a better overall experience.

1. What works/doesn’t work for you with the current model?
2. Do you access entertainment data from other sources in addition to IMDb?
3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
5. Does how you plan on using the data impact how you want to have it delivered?
6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
7. Are our T&Cs easy for you to understand and follow?


Thanks for your time and feedback!

Regards,

Aaron
IMDb.com

Responses

8 Messages

 • 

600 Points

4 years ago

I'm late, sorry:

1. What works/doesn’t work for you with the current model?

The only thing right now that is bothering me is that it seems we can't get the Imdb ID of an item (like tt0050783) from those lists. Am I missing something? That means I can't create links to Imdb simply by using the lists, I need to query the server to find the ID?

2. Do you access entertainment data from other sources in addition to IMDb?

Not really. Wikipedia and, rarely, Mubi.

3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?

Less useful! Way too much data to handle, most of it useless.

4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?

If it gave access to Imdb IDs, yes it would. Also useful for making small queries. But keep it simple. In the end, I think you should have both models.

5. Does how you plan on using the data impact how you want to have it delivered?

Naturally.

6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?

I don't really care about JSON; CSV files like the downloadable ratings list would be fine too.

7. Are our T&Cs easy for you to understand and follow?

I don't think I've checked them, frankly.

1 Message

 • 

60 Points

4 years ago

In the T&Cs it makes it clear the interfaces are for non-commercial use only. How about commercial use? I work for a media company that would love to use some of this data for feeding predictive models.  We regularly license data and would love to find out if IMDB licenses data for commercial purposes or we could get permission to use the interface data for commercial purposes. I'd love to talk more with someone from IMDB if you're open to that. Thanks, Jim

1 Message

 • 

60 Points

4 years ago

First off, here's my use case:
My wife is a huge classic movie buff. She majored in film studies, and has seen literally thousands of movies of the 30's, 40's and early 50's. One entire wall of our bedroom is covered with bookshelves about classic movies, stars, studios, directors, authors, etc. I'm interested in finding her movies to watch from various online sources: streaming, etc. Also to keep track of when they're showing on TCM  etc. So I need to build some kind of list of classic movies, keep track of those that she's seen, and see if they're available.
I've been doing software development for decades, starting with C, and now moving on to more modern languages. I'll probably build a little app or just use Excel.

My preferred solution would be to get the data into Excel, and just I could work with it there. 
In particular, this would be _ideal_: Document the query parameters of advanced search (http://www.imdb.com/search/title) and allow the results to be returned in JSON, CSV, etc. 

Here are answers to your questions:
1. What works/doesn’t work for you with the current model?
Difficult to parse. Hard to join across the different files. Not clear if it's a complete dump (appears not to be).

2. Do you access entertainment data from other sources in addition to IMDb?
Don't

3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
Much more useful. I could load into some DB and then access the fields I want, filter, etc.. Basically if I chose to use the current access model, the first thing I would do is to build exactly this. 

4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
More useful. I never really need the entire data set, and I'm not doing any BI across it. What I need is to be able to query and see results. An API would provide this.

5. Does how you plan on using the data impact how you want to have it delivered?
I guess so. If I was doing BI (e.g. how do ratings of movies change with respect to the actors age) then I would want the entire data set.

6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
JSON would be fine. XML or CSV would be better for my use case.

7. Are our T&Cs easy for you to understand and follow?
To be brutally honest, I'm just using the data for personal use, so I figure it's fine and didn't read them.

1 Message

 • 

62 Points

4 years ago

1. What works/doesn't work for you with the current model?

OMG, the comments on this page are up to 3 years old.  When was it you mentioned that you were actually going to deliver on this?

In my opinion, JSON is one of the worst formats for Data.  No one can use it except developers.  Sure, they can read it.  What fool idiot is intending to read your JSON Files.  Then each File must have it’s own parsing code because none of your files are the same.  Databases have been established since the day of computers, certainly there is a real Database Format for you abundance of data that would be far more superior.  A TAB separated document would be a good example.  Equally as simple, readable, and USABLE.  Secondly, the idea of adding all the documentation into the documents is a joke.  Data is data, the documentation should be separate and intelligently structured.  Your ratings.list actually seems like at least three different databases in one, PLUS documentation...  Come on, if you’re going to be so kind as to allow the world to use your data, at least release it in a usable format.  Not many people are going to learn how to develop, write separate code for each .list, just to appreciate your data.  Most the people on the planet would be happy if they can get their equation to work in Excel, and how do ever expect them to be able to use your data without tremendous expense.

All the guys that have commented on this structure are all experts and developers.  Combined they may have billions of hours of experience making JSON work, but most people can’t use the garbage.  No XML is not the solution, it too has issues.

IMDB has numbers on every film, why are those not used with as keys???

How hard or what disadvantage could a two field TAB DB file be (or any other normal “DataBase” file)???

IMDB_Number Release-Date Done and finished, EASY for the planet to read and USE, not just developers that have spent a long time writing code to parse the data for each .list file (dozens!!!).

Not much, the files are in horrific structure and have no continuity, similarity, or keys.

It is very difficult to read the files as anything other that text on a Macintosh.  JSON fails miserably for data delivery.

It has taken an enormous amount of time to code and set up a way to parse the data so that is in record form.

It seems impossible to ever to be able to use the plot.list and others like it, as it is not in a record form.  IE: Field one, delimiter, field two, etc...

The documents in the plot.list format vary per other similar .list files.  To be able to use the plot.list, you write a lot of code to change it into a usable source.  What a lame waste of time.

Why are the IMDB Ratings embedded half way through the data, AND WITH NO DELIMITERS?  On this file, one must parse the Rating and the Title.  How can such a poor format ever have been effective???

One contributor mentioned the simple fact that the actors.list and actresses.list are not even continuous data.  They must have a lot of coding, and reformatting into a real format just to use the data.  Why the lousy game?



2. Do you access entertainment data from other sources in addition to IMDb?

I’m starting what I thought was a simple project, and yes other sources provide far more usable API’s and Data.

I assume most people are looking for the easiest and least expensive way to utilize Data.  IMDB sure has solid data, just it’s so poorly formatted!!!

It would be easier and less expensive coding to parse the data out of your web pages rather than read it from the provided .list files!


3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?

Any data format with intelligence would be far better than what has been provided.  See above, you have millions of records.  JSON is a expensive and lousy way to deliver them.

Databases should be reliable and easy to work, yes keys would bring in a lot of accuracy, far more than using the long title names or other serialization method.

Please contact me, I’d be happy to help you set up an intelligent format.  It’s the 21st millennium, it’s not that hard to have reasonable data sets.


4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?

I don’t care if I use static files or a server API, although a Server would be far easier than re-establishing a new Database to parse from for every update.

The principal idea I hope would be to provide a viable solution for the planet, not just a sub-set of elite developers that can actually figure out how to use your data.


5. Does how you plan on using the data impact how you want to have it delivered?

No, rules are no online usages.  Running specific films would be far easier in an API than growing an entire data field of files that all need to be individually and specially parsed.

Basic files are fine, it’s the format that is horrific and takes gigabytes of space to accommodate, let alone the time to process it all.   I’m not sure why anyone would want to process many times over a request for thousands of Titles.  It seems as most usages would be one at a time.  A server request would be far superior for constant requests of individual records.


6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?

As files are in JSON, no it is the worst most horrible format ever.  Using the data is very expensive!

Any real database format would be great, I suggest TAB separated vs comma since your titles have commas in them.  In your current structure, it seems as it be far easier.  Although provide a common key.

(Let’s stop the closet dweller who set this nasty thing up from creating CSV files, using comma’s separators would turn out as poorly as the current system.  The Title have commas in them!)

Actually, I’d like to see it in a JSON Reader, I wonder how it looks.  Not a reader that I have written special code into, I’ve done that.  As a usable format, JSON is garbage.


7. Are our T&Cs easy for you to understand and follow?

No, most of what has been provided seems like an insult to humanity and intended to make us hate your data, basically it’s a huge “go get lost everyone” statement.  Maybe it’s a developer’s only cliché, but how about providing something usable?

Many things in our world are easily and intelligently documented and formatted.  Why did IMDB go so far out of the way to provide such an expensive lousy system.

The idea of a daily dump is great, but keeping current would be far easier in an API from a Server.

Daily updates could be done in kilobytes, not downloading gigabytes.  Oh and then you start back at scratch, and have to re-set up all the files again...  What an inefficient complete waste of time!!!

It takes a considerable amount of time to cleanup horrific data each time, then converted it, then adjust it, etc... every time an update would be desired.  A nightmare, if I tried that daily.  And what an expense!!!

How about documentation and Data sets.  The idea of adding all the notes in the leading is cute, but it is just pollution as far as data goes. (although the first line timestamp could stay for accuracy)


NOTE:  I'm sorry if this seems unappreciative, it's not really.  I have thought a little more, and about my loud considerations.  But IMDB has opted to share with the world a great Database, but they have limited it's value to the smallest group available.

To access the Data, you must build Array's etc or convert it to "Usable Form".  Database data should not be so hard to use.  It's a data set.

Most the planet appreciates your gift, but has no way to use it without spending a large some of cash to code something.  My perception is that if you're going to share it, then do so.  Please don't lock it up in some baffoon format that only skilled trained developers can use.

The world would love to use your data set, but are likely insulted that it is provided in such a hostile form.  Sure right, JSON is too cool but only used by a very very small small group of humans.

If you're going to gift the planet, how about doing so in something the whole Planet can use???  Please think about it.  Oh sure, JSON is the best of the best for a significantly small group of people.  I have read much of the replies.  I'm glad I'm not the only one who has noticed it's ridiculous and mostly a complete waste of time.

I thought about that 'plot.list' file again, since it must be extracted by array or a billion hours making it usable data.  How obliviously nasty is it.  The text is even cut down into multiple single lines instead of a flowing text field.

PLEASE PLEASE PROVIDE USABLE DATA.  WE LOVE YOU (well at least your data) it would be really beneficial if people could use it.

They say there are few Women in tech, so basically your data can't be used by half of the planet, and not much of the other half.  Why was it made so difficult to use?

If you're going to do something, wouldn't you feel it much more rewarding to do it right and kill this small elitist format and provide something real for the real world?

We would really really appreciate something usable.

The only people that might not like you, is anyone that had to go through the extensive effort to parse your current data set.  But the next time, maybe it will take them minutes instead of many expensive hours.  I mean really, TAB was too difficult.  The whole planet can use it, it's the same text, just with intelligence.

I hope IMDB hears me and the other not so thrilled replies.  Thank you

(One side note; Sure IMDB has done a great job and has a web great site.  Thank you very much for all the effort and service.  Asking for something reasonably usable for the masses, seems a just cause to save the planet. Please consider it SOON)

5 Messages

 • 

210 Points

4 years ago

First, thanks for your data dump!

As far as I am concerned, I support most of what is already said above. What I am mostly missing in the dumped data are connections (references).

Can you say anything about the changes we can expect and when we can expect them? Thanks again.

1 Message

 • 

60 Points

4 years ago

It has been a very long time since this question was originally asked. Is there any progress to releasing the data in a more usable and sensible format? I fear there is none.

If anyone on this forum is interested I have been developing an application that parses the ratings.list and genres.list into a local XML data file. The files are automatically downloaded and parsed. It allows the user to perform advanced searches of the movie data.

https://github.com/adscott1982/IMDbQuery

1 Message

 • 

60 Points

4 years ago

Hi. i use the imdb data daily for 3-5 years now. i switched to omdbapi.com to query video information. The key was to search by year and title for movies or index such as "tt0795421". I would get everything back about that movie in a quasi JSON / text-value pair such as a email header and could easily parse it with perl. I downloaded the complete ftp and haven't found the information in any useful form yet that I can query information for each movie. I'm particularly interested in the movie posters which I use to automatically download. This has become more difficult with the recent changes.

They provided a simple api interface so I could query by index or by title, year, etc.
Here is a sample for a given index of what I got back from omdb and found this to be very useful format

{"Title":"Citizen Kane","Year":"1941","Rated":"APPROVED","Released":"5 Sep 1941","Runtime":"119 min","Genre":"Drama, Mystery","Director":"Orson Welles","Writer":"Herman J. Mankiewicz (original screen play), Orson Welles (original screen play)","Actors":"Joseph Cotten, Dorothy Comingore, Agnes Moorehead, Ruth Warrick","Plot":"Following the death of a publishing tycoon, news reporters scramble to discover the meaning of his final utterance.","Language":"English","Country":"USA","Awards":"Won 1 Oscar. Another 8 wins & 10 nominations.","Poster":"http://ia.media-imdb.com/images/M/MV5BMTQ2Mjc1MDQwMl5BMl5BanBnXkFtZTcwNzUyOTUyMg@@._V1_SX300.jpg&quo...}

15 Messages

 • 

820 Points

4 years ago

Thanks to the feedback and suggestions in this thread, we now have an improved and more robust interface for accessing IMDb data. 
  • Datasets are available in Amazon S3 and are refreshed daily
  • IMDb title and name identifiers are included in all the files for ease of matching and linking back to IMDb.
  • The files are in tab separated values (TSV) format with column headers
For details on the S3 solution, file format and access guidelines, see www.imdb.com/interfaces.

 If you have any further questions, please see - https://getsatisfaction.com/imdb/topics/imdb-data-now-available-in-amazon-s3

 Thank you for your continued support.

1 Message

 • 

60 Points

2 months ago

Do you offer is sync Disgogs with your Data by api. yes ?
can you made robots to sync two archives? if Disgogs reject your solution to link archive.org Data
also my idea is that your offer is good to end user if they need full detail about each track or album or atists.

https://lmovers.ae/نقل-اثاث-دبي-2/