3 Messages
•
390 Points
Thu, May 29, 2014 9:28 PM
API/Bulk Data Access
Hi!
We’re in the process of reviewing how we make our data available to the outside world with the goal of making it easier for anyone to innovate and answer interesting questions with the data. If you use our current ftp solution to get data [http://www.imdb.com/interfaces] or are thinking about it, we’d love to get your feedback on the current process for accessing data and what we could do to make it easier for you to use in the future. We have some specific questions below, but would be just as happy hearing about how you access and use IMDb data to make a better overall experience.
1. What works/doesn’t work for you with the current model?
2. Do you access entertainment data from other sources in addition to IMDb?
3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
5. Does how you plan on using the data impact how you want to have it delivered?
6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
7. Are our T&Cs easy for you to understand and follow?
Thanks for your time and feedback!
Regards,
Aaron
IMDb.com
We’re in the process of reviewing how we make our data available to the outside world with the goal of making it easier for anyone to innovate and answer interesting questions with the data. If you use our current ftp solution to get data [http://www.imdb.com/interfaces] or are thinking about it, we’d love to get your feedback on the current process for accessing data and what we could do to make it easier for you to use in the future. We have some specific questions below, but would be just as happy hearing about how you access and use IMDb data to make a better overall experience.
1. What works/doesn’t work for you with the current model?
2. Do you access entertainment data from other sources in addition to IMDb?
3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
5. Does how you plan on using the data impact how you want to have it delivered?
6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
7. Are our T&Cs easy for you to understand and follow?
Thanks for your time and feedback!
Regards,
Aaron
IMDb.com
Question
•
Updated
2 months ago
118
49
Helpful Widget
How can we improve?
Responses
alex_bigelow_7338091
2 Messages
•
72 Points
6 years ago
I agree with the comments to move toward some kind of standard (CSV, JSON, XML, whatever)... the files are tricky to parse in their current formats.
2. Yes and no... I've linked with Rotten Tomatoes data before, and I'm currently looking into linking with MovieLens.
3. I guess it depends on what "primary key" refers to (movies? users? actors? roles? all of the above?)... for my purposes, I'd probably just generate my own IDs anyway, so I don't think it would matter too much. But I'm sure there is some use case where they could be a big help.
4. An API would certainly be nice, but I don't think it's really that necessary - the files aren't that big, so people should be able to build their own query tools. And I'd definitely not want an API if it meant we could no longer download flat files.
5. Definitely! This is a standard rule across all data analysis - your tasks guide the structure you choose. Even with an API, people are going to probably end up doing a lot of custom reshaping anyway.
6. JSON is absolutely sufficient.
7. This page is pretty straightforward: http://www.imdb.com/help/show_leaf?usedatasoftware
Maybe some wording could be clarified, or maybe examples would help. E.g. does "individual personal use" mean that I can use IMDB as a test dataset locally on my machine for a research project, as long as I don't expose the IMDB data to the public? What if my research project is commercial - but I'm selling a system, not the data, and only using the data to demonstrate the system?
This is a really crazy corner case, but information about how to cite IMDB in academic publications would also be useful.
3
0
davidah_ca
Champion
•
1.9K Messages
•
92.6K Points
6 years ago
At the time the lists were created, IMDb did not actually have ID constants; for many years the Primary Name and Primary Title were used as the identifiers for People and projects respectively. This is where the use of Roman numerals began, as it is essential that each primary key be unique.
The addition of the various key constants is a relatively recent addition, and IMDb has not modified the lists to include them (or any other additions).
0
0
abhijit_akhawe
1 Message
•
60 Points
5 years ago
2. Not yet, but might need to unless IMDB can provide easier api and syndicated content
3. I don't think a single data set is the answer though maintaining ids across data sets would be useful. It would be good to have a more API like model.
4. A more queryable api that provides access to data would definitely be more useful.
5. yes
6. json is sufficient.
7. no.
0
0
mansour_behabadi
2 Messages
•
130 Points
5 years ago
https://github.com/oxplot/imdb2json
0
spaceturtle
1 Message
•
60 Points
5 years ago
Getting started from scratch - downloading all the necessary files (which can be quite large), processing them, then importing them into a DB or large data structure like a dataframe before you can even start querying.
2. Do you access entertainment data from other sources in addition to IMDb?
Yes, I've also used OMDb.
3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
It could be useful in that the processing and importation steps I mentioned above would be minimized/removed altogether, but I can imagine the downloads would be huge, and depending on how it's structured you may be getting tables that you don't want/need.
4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
Absolutely. Depending on the project I'm working on, I may only need a rather limited set of data per query, and a well-structured and flexible API would be fantastic. The cap rate would need to be reasonable (perhaps a free tier at X requests/sec or whatever, then paid tiers above that), and one would need unlimited access to the data (i.e., if a query returns 5000 results, have the ability to acquire all 5000, perhaps 100 or so at a time).
Personally, I'm much more comfortable working with JSON (preferred) or XML than I am with SQL, so an API would be greatly beneficial to me. On the downside, it would make things like natural language analysis more difficult if the current model were to be discontinued, so I think there's definitely a market for both means of access.
5. Does how you plan on using the data impact how you want to have it delivered?
Yes, see above. I can imagine use cases where an API would be much more effective, and others where having the full dataset immediately accessible would be beneficial.
6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
Ideally, the requester should be able to specify which format they'd like the data to be delivered in. It should definitely be RESTful so developers don't need to completely rewrite existing code designed for other services.
7. Are our T&Cs easy for you to understand and follow?
As far as I've experienced, yes.
0
0
llamswerdna
1 Message
•
60 Points
5 years ago
1
0
harold_coenen
2 Messages
•
72 Points
5 years ago
This data is awesome and is made for MongoDB or the likes.
API would be sweet as extra tool, but I'd be ok with direct access to your Mongod, with reading auth :) Just saying hehe :)
0
0
gar37bic
1 Message
•
62 Points
5 years ago
FIrst, we are only interested in movies related to space, space exploration, etc. This might be fiction or documentary, etc. We already have a substantial database of relevant titles, with pictures and summary information, along with user-provided data. I would like to add selected data from IMDB (not sure what yet). The IMDB section of the page would be linked directly to the IMDB page, so people can get further information if they want.
Now, to the questions:
1. What works/doesn’t work for you with the current model?
I don't have any answer for this yet.
2. Do you access entertainment data from other sources in addition to IMDb?
Our existing database has been generated internally, and from our users with some data manually collected from Wikipedia.
3. Would a single large data set with primary keys be more or less useful to you than the current access model? Why?
I think we could work either way. Pulling the data once per day makes less load on our servers as well as yours.
4. Would an API that provides access to IMDb data be more or less useful to you than the current access model? Why?
We haven't used the existing FTP data yet.
5. Does how you plan on using the data impact how you want to have it delivered?
Not at this time.
6. Is JSON format sufficient for your use cases (current or future) or would additional format options be useful? Why?
I believe that either JSON or TTL ('turtle' RDF) would be OK.
7. Are our T&Cs easy for you to understand and follow?
I haven't read them yet! :D My expectation would be that in addition to the visible citation to IMDB, we would certainly intend to link to the IMDB site, either to the relevant page in IMDB, or if you prefer, to the main IMDB page. We strongly believe in accurate, reliable reference information, including the date of retrieval. We generally also cache data we collect, to maintain referential integrity, and in case a remote service is not available. We would assume/hope that we could continue to publish that cached data under the same constraints as the original. We certainly appreciate the hard work that IMDB has put in to supplying the data, and want to assure that our users are aware of our sources.
If desired, if you do make an api we would also consider supporting a return data channel, such as potential corrections to your data, reviews, or votes.
0
0
ngws
3 Messages
•
142 Points
4 years ago
Thanks for the initiative.
I can see that some of the requests have been focused on the particular format of your data. I suppose this has its importance, but I don't care as long as it's machine-readable. As a member of a foreign, somewhat small country (Denmark), what I do care about is the availability of data related to non-US usage.
For example, your FTP archive does contain some foreign-language AKA titles, but I could only find German and Italian ones. Meanwhile, the standard IMDb web interface shows AKA titles for lots of countries.
In general, my request is to include more foreign data in the public archives. :)
--
Niels
5
0
mohamed_oun
1 Message
•
60 Points
4 years ago
1. I'd like to ask if there's a way to get the accurate votes breakdown for each movie? for example like this one:http://www.imdb.com/title/tt0111161/ratings?ref_=tt_ov_rt
I know there's a distribution column in the ratings.list file, but it's not very accurate.
2. Is there a way to turn the files into an SQL database?
0
0