IMDb Data – Now available in Amazon S3

This is an announcement for customers of the IMDb bulk data available via FTP.

We are pleased to announce, starting today IMDb datasets are now available in Amazon S3 via an HTTPS link. Using the new interface, customers can bulk-access IMDb title and name data.

For details on the S3 solution, file format and access guidelines, see www.imdb.com/interfaces.

In our continued effort to best serve our Contributors, we are streamlining the datasets and making them available in a more useful and structured format in S3. Notably:

Data refresh frequency is now daily (previously weekly).
IMDb title and name identifiers are included in all the files for ease of matching and linking back to IMDb.
The files are in tab separated values (TSV) format.
The sets of data we provide are updated to only include the essential ones that help with matching and linking to an IMDb title or name.

As part of housekeeping the FTP site, the data files will no longer be updated. The list data files will continue to be available at two locations (see below) until February 28, 2017. We strongly encourage FTP site users to switch to the S3 solution at the earliest to ensure their applications continue to work without interruption.

ftp://ftp.funet.fi/pub/mirrors/ftp.imdb.com/pub/frozendata

ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata

If you are not an IMDb Contributor and wish to obtain IMDb content for commercial use, we offer a content license. The license grants you access to our content via an XML web service, plus the right to use the content in your product or service. If that interests you, please email licensing@imdb.com.

If you have any questions or concerns, please share your feedback in this thread.

Thank you for your continued support.

Responses

Oldest First

Selected Oldest First

Official Solution

Col_Needham

Employee

•

8.5K Messages

•

194.9K Points

9 years ago

Thanks for the feedback so far on this thread. Please do continue to post and we will try to take as much as possible into account. This post answers some of the questions raised and there will be further updates based on the next round of feedback.

On the S3 access issues, we now have a working prototype of a system which can make the same S3 data available to you via HTTP from IMDb directly without requiring any S3 registration and free from any possibility of AWS charges. Please watch for an announcement as we convert this into production code. The only thing needed will be an ordinary IMDb user account attached to a valid email address. We still intend to also make the data available via S3 for those people who find the AWS access tools more convenient and can stay within the free tier of AWS.

On the general data availability, we are adding the AKA titles to the basic data set accessible to everyone. Longer term, we are looking at the possibility of daily diff files for at least some of the data in the basic set.

On the point about contributors, we are looking at extending the range of data available via the http solution based on your contribution history and volume. For top contributors and those people using the data to help us clean it via bulk corrections, this is likely to extend far beyond the current set of data even on the FTP site. It is not our intention to deprive access to the data by those people who have genuinely helped to build it over the years and who want to continue to improve IMDb. We aim to also be able to grant specific permissions to specific customers for specific extra subsets of data as required on a case by case basis. This latter part may take some time to become a fully formed solution so please bear with us.

The background to all of this is that there is a huge multi-year technology migration project which is nearing completion at IMDb. We have too many complicated old systems around which have been slowing the overall pace of development (I add a bit more detail to this on https://getsatisfaction.com/imdb/topics/why-doesnt-imdb-staff-ever-consult-with-the-contributor-base...). The move to the new technology has been providing the opportunity to look at the way we operate different parts of the IMDb service. One of the oldest software systems is the one which publishes the FTP data, and we will soon no longer to even be able to generate the .list files once the final pieces of the old IMDb system are decommissioned; at least not without re-writing all of the publication software to connect to the new system and produce an extremely difficult to manipulate text file format which was designed 27 years ago and has not changed in 21 years. Instead, we decided that it would be better to publish the data via a modern system (S3 and soon over https) in a modern format which can be more easily parsed. The other problem with FTP is that we have no idea how many people are using the data and for what purpose, nor do we know what additional things they may want from the data. From feedback over the years, we knew some of your requirements already, notably (a) access to the title and name constant data (b) an easier to parse format (c) information to help in matching other catalogs to IMDb (d) more frequent updates. We found ourselves having to guess the remaining requirements until we decided the best way forward was to move the data to a new location within the FTP sites, post an announcement on Get Satisfaction (this thread) and then wait to gather feedback before replying and figuring out what steps to take next (this reply).

We hope this helps. We have plenty to be working upon in the meantime, and we will follow-up as we deliver parts of the above.

Col
Founder & CEO, IMDb.com.

Vincent_Fournols

2.4K Messages

•

81.2K Points

Thank you Mr Needham for this much welcome answer, which is somewhat reassuring.

V.

9 years ago

jeorj_euler

10.7K Messages

•

226.5K Points

Thanks, for your feedback, Col Needham. I'm pleased to know that you have big plans. I hope the things of the future wind up having more merits than demerits.

— Jeorj Euler, an IMDb regular registrant

9 years ago

gardner_von_holt

16 Messages

•

792 Points

Thank you for responding, at least the timing and some of the motivation for this is clearer.
However, without the ability to answer the simple question "who is in this move", the data is essentially useless to me.
Sad day for imdb, when it switches from "here is some awesome data, we are excited to see what you do with it" to "Convince us why we should let you see our data"
Reluctantly I am switching to another data provider, themoviedb.org

9 years ago

valen_dbhyhk9css1a7

3 Messages

•

246 Points

“

Sad day for imdb, when it switches from "here is some awesome data, we are excited to see what you do with it" to "Convince us why we should let you see our data"

“

You said it best, right there.

There are 2 main ways to "consume" IMDb data (files): 1 - The Static Mode: People already have an use for the data and go there and grab it for their usual needs; 2 - The Discovery/Explorer/Research Mode: people get some data, and then detect some patterns, get/try different data (files) and see some more patterns, get new ideas and connections, test theories, invent new uses for the data, detect errors/inconsistencies, get yet another batch of different files to retest other hypothesis...

While the first type of Users is perfectly fine and has an important place (plus, those same Users could already have been or they have the potential to become a type 2 User) the real "power" of (IMDb) data use is in the 2nd type.

IMDb wants to categorize/"lock" people in "typical", static/monolithic, use cases (which they can't because there is no such thing, when one gets access to such a rich and diverse quality data – the richer the data the more unlimited are the possibilities) without the understanding that is in exploration the true power and possibility of the unimpeded access people enjoyed until now. This is also the mode where more errors and problems with the data are bound to be discovered and, thus, reported/corrected.

So, they are (re)enabling type 1 Users (which is good), but type 2 Users (where the true value, for both IMDb and the Users/Contributors, really lies) are shackled and constrained in typical use-case boxes that has little or no real use to them (because the most desirable "consumer" of data, for IMDb, is the one that has no [pre}set idea what she/he'll do tomorrow with that data; the ones that ask themselves "What if...?" and then go about checking that out). And these Type 2 are also the main Contributors (if not to IMDb directly, at least indirectly) to the Film/TV community. They make all of us appreciate and understand all the interconnected nature of the art form. And that'll always return, in the end, to IMDb, in one form or another, because the more (quality) information and the bigger the community, the more people will turn to IMDb (because is one of the best, more popular, places to know more). Curtailing, impeding, Type 2 users, in any way, is a substantial self-inflicted wound (the proverbial "shot in the foot").

In spite of building one of the most successful stories of data gathering/maintenance in history they seem to lack a basic understanding of how this was achieved, how the whole direct/indirect feedback loop works, how the ample availability of their (almost) raw data was like seed(s) for fertile ground(s) -- where they could/would reap the benefits, severalfold, later, down the line, directly or indirectly. This kind of decisions seems to blissfully ignore how IMDb arrived at this point in time.

This is akin to the Captain of the Titanic failing to acknowledge the “invisible” 90% of the iceberg that lied below the line.

9 years ago

nick_2lqi1n2rpiwpu

3 Messages

•

304 Points

I don't think I can or ever will understand or be able to wrap my head around this line of logic.

We created this awesome movie database, lets make an API

We made this awesome API, lets give everyone access to all the data!

20 years pass

Now that we've given everyone all the data for 20 years, and the API is old
lets update it!

People can't be using ALL our data right? No way that people are actually using all the data we provided them over the course of 20 years just because we can't see analytics for it. Not possible Nope.

Well just remove access to 9/10ths the data then.

Its fine if you guys change the format and work to update the way its parsed, removed redundancy etc, I am ALL for that, trust me you. But plain and simple, not giving me something I already use, is going to destroy so many projects, and development ecosystems, I don't think that we have a number that could go high enough to represent the amount you are killing by removing access to so much of the data. You're literally killing an un-ending amount of (infinity) projects.

9 years ago

ron3

421 Messages

•

9.9K Points

On the point about contributors, we are looking at extending the range of data available via the http solution based on your contribution history and volume.

We aim to also be able to grant specific permissions to specific customers for specific extra subsets of data as required on a case by case basis. This latter part may take some time to become a fully formed solution so please bear with us

Any update(s) on the above?

8 years ago

Official Solution

sv_6654070

15 Messages

•

820 Points

9 years ago

We are currently working on asolution for data access via HTTP endpoint as an alternative to thedirect AWS S3 access. As part of this solution, we are also looking into a contributor exclusive solution, providing extended datasets based on a person’scontribution history and volume. Given these developments, we are postponing the shutdown of the IMDb FTP sites to November 7, 2017.

Please stay tuned for more updates. Thanks!

gardner_von_holt

16 Messages

•

792 Points

My takeaway from Col Needham's recent post was that the primary issue standing between providing complete data and not was the significant development effort to make the data available from the new systems.

But now, It sounds like from this post that IMDB will have the logic in place to provide fuller datasets, and will provide them to contributors.

Now it appears to be purely that IMDB wishes simply to prevent the public from accessing the collected data for free.

This appears to no longer be about development efforts, rather it appears to be exclusively about cutting off free data.

Frankly, IMDB needs to get its story straight. If the data is in fact available, and more complete data sets would be there for us ... if we provide IMDB with free (aka donated) labor, then the entire premise of Col's justification looks shaky .. at best.

I hope you guys come to your senses, but I no longer believe we are being told the real story here, so I have now completed moving my data sources away from IMDB, and will donate both time and money elsewhere.

I've enjoyed using IMDB's data since well before it sold out to Amazon, and its sad to see so many years of cooperation trampled by this ill considered project.

9 years ago

jeorj_euler

10.7K Messages

•

226.5K Points

Thanks for postponing the shutdown, IMDb staff. We can at least grant y'all that.

— Jeorj Euler, an IMDb regular registrant

9 years ago

jeorj_euler

10.7K Messages

•

226.5K Points

Due to my notification settings, I was informed: "Nobody liked your comment". That's kind of funny like a pun. Thanks, Nobody.

— Jeorj Euler, an IMDb regular registrant

9 years ago

gardner_von_holt

16 Messages

•

792 Points

With the revised shutdown date now days away, I note that no promised updates have not occured.

Really a shame to see an organization lose its way.

9 years ago

jeorj_euler

10.7K Messages

•

226.5K Points

I can foresee the IMDb contributor base potentially splintering into "patriots" and "loyalists". I may be no patriot, but the loyalists are really starting to disgust me.

— Jeorj Euler, an IMDb regular registrant

9 years ago

Official Solution

chris_h_7i63q04tc55pm

Employee

•

66 Messages

•

3K Points

9 years ago

Thank you for the continued feedback.

Earlier in this thread, Col referred to a prototype of a system which can make the same S3 data available to you via HTTP from IMDb directly without requiring any S3 registration and free from any possibility of AWS charges. This system will require an ordinary IMDb user account attached to a valid email address. However, this system is not yet quite ready for production so to help address some of the concerns raised about the 'Requester Pays' access via S3, today we activated an https entry point to provide access to the basic datasets. This https location is here, https://datasets.imdbws.com/ The page http://www.imdb.com/interfaces/ has been updated with this information.

We are finalizing the extended datasets and access model and I will post an update about that as soon as it is ready.

The final build of the data that gets published to the FTP mirrors occurred yesterday so those mirrors contain the final FTP snapshot. While the data on the FTP servers will not be updated going forward, we will not remove the data for at least the next few weeks so people who need that data can still download it.

Vincent_Fournols

2.4K Messages

•

81.2K Points

A first quick look at title.basics.tsv.gz is very promising, and much more satisfactory than the former FTP offer.
Could you please state the charset used to published the text files?

9 years ago

chris_h_7i63q04tc55pm

Employee

•

66 Messages

•

3K Points

Hello Vincent,

The files are in the UTF-8 character set. I have pushed out an update to the http://www.imdb.com/interfaces/ page to add that to the file details section.

Best regards,
Chris.

9 years ago

owenrees

523 Messages

•

15.3K Points

My quick look at title.akas.tsv.gz suggests it is UTF-8 - tt0000008,5 is Cyrillic and looks sensible, emacs reports that the underlying file was UTF-8 and displays the characters correctly.

That Cyrillic aka title does not appear in the aka-titles.list.gz file on the FTP site.

It looks to me as if the new system is including data that the old system could not handle.

9 years ago

manfred_polak

5 Messages

•

240 Points

The latest and final FTP snapshot is online now. But once again there is no updated actors.list! It's still from September 22. That's very annoying and embarrasing - 3 months should have been time enough to fix this problem. At least the final dump should contain recent files only.

By the way, on the German FTP server there is a new directory "frozendata". The files are now available both in
ftp://ftp.fu-berlin.de/pub/misc/movie... and
ftp://ftp.fu-berlin.de/pub/misc/movie..., but I guess that in the long run, only "frozendata" will remain.

9 years ago

lucacanali

1 Message

•

100 Points

Thanks for providing the HTTPS interface. It's really appreciated.

Something I don't understand is about the genre information. For all the movies that have more than 3 genres, only the first 3 in alphabetical order are reported.
For example for "Dunkirk" (tt5013056) the War and Thriller ones are omitted.
And Dunkirk really is a "War" film... I cannot get the reason for this limitation.

In fact, also imdb.com has this problem. The top bar lists only 3 genres, but then in the "Genres:" tag has all of them.

I would be also really great if you can add also the "Country" and the "Keywords" information. This info is really important to categorize movies. Without it I have to resort to scraping your site, that is something I would like to avoid.

Thanks,
Luca

8 years ago

dgranger

3.5K Messages

•

86.2K Points

9 years ago

In English, what does that mean? Does that mean that the IMDB is going to totally pay per use? Does that mean that the regular free site is shutting down?

sv_6654070

15 Messages

•

820 Points

No, this announcement is for users of the IMDb ftp data only. No other IMDb services are affected.

The FTP sites will be retired on Sept 10, 2017. The S3 datasets that replace the FTP site is configured to be requester-pays, the requester accessing data from this bucket is responsible for the data transfer and request costs. For details on the charges, please refer to https://aws.amazon.com/s3/pricing/.

AWS S3 bucket will be the only location to bulk access IMDb datasets for non-commercial use after Sept 10th.

9 years ago

dgranger

3.5K Messages

•

86.2K Points

So this will be a pay per use site. Not a good idea.

9 years ago

sv_6654070

15 Messages

•

820 Points

No, IMDb remains a free site as always.

Just to clarify, the requester-pays configuration is only for bulk accessing IMDb datasets from S3 and not for the www.imdb.com website.

9 years ago

matisszz

3 Messages

•

90 Points

9 years ago

Any idea when alternative titles will be available as a dataset on S3, like the aka-titles.list.gz on the old ftp servers? I don't see it on S3 and the current aka-titles.list file doesn't appear to be complete if I compare the data in that file with data on the actual IMDb site.

ddb_hsnrsxjeb6gha

2 Messages

•

110 Points

9 years ago

I don't see any file that shows either all cast for a title or all titles a cast member is in. I need full xref like the old actors/actresses.list files.

sv_6654070

15 Messages

•

820 Points

The datasets in S3 are focused on the contributor usecases for matching names and titles and linking back to IMDb. The S3 tables title.principals and title.crew provide the principal cast and crew information respectively for all the titles. The name.basics table has the known-for titles for all the names along with their top-3 professions.

To help us better understand your usecase, please share details on how you use/plan to use cast & crew data.

9 years ago

ddb_hsnrsxjeb6gha

2 Messages

•

110 Points

My usecase is for tracking my personal collection (currently about 35,000 entries) and wish lists. Since these files have been available for personal use for years, I normally download the actors/actresses.list once a year. Each one of these I then process into 4 files (movie title / TV episode / cast with all their movie titles [S3 not available] / cast with all their TV episodes [S3 not available]). I track my collection on the first 2 lists, keeping track of what type of media it is on, the alternative title if mine is different than imdb's, etc. Then I programmatically update this info on the much larger second 2 lists. If at some point I decide to add actress Jane Doe to my wish list, anything she was in that I already have is already marked for me. About once a year, I find that I need to start adding my new titles manually so it is then time to download again. It takes a couple weeks after each download to verify changes, etc., to get good updated lists (you'd probably be surprised how many times you change release year on some titles, etc.). This has been very convenient for my purposes.

The S3 files appear to contain all titles and all cast but no way to get more than a subset of the xref between them (i.e., no way to list all the cast for a title, or all the titles a cast member is in). At this point it looks like I may as well take my latest file set, strip it down to only those entries I actually own, and copy/paste updates in as my collection grows. Like some of the other posts, I just thought going to the cloud would mean more data available, not less. Not really interested in learning enough java or something to just get a few downloads once in a while either. Thanks for all the help you've been to me through the years.

9 years ago

matisszz

3 Messages

•

90 Points

9 years ago

This reply was created from a merged topic originally titled Getting an up-to-date alternate title dataset, like the old aka-titles.list.gz.

I'm trying to get the full and up-to-date dataset for alternative movie titles. I first tried the archive from the ftp servers - aka-titles.list.gz - but some titles appear to not have all the information.

Example:
http://www.imdb.com/title/tt0090557/releaseinfo?ref_=tt_dt_dt#akas

That has 17 alt. titles, while aka-titles.list contains only 3.

I figured the new S3 datasets would be better, but those don't even have alt. titles available, just the most basic movie information.

Any idea how this could be resolved?

sv_6654070

15 Messages

•

820 Points

9 years ago

The alternative versions of titles in S3 are limited to the primary title and original title in title.basics table.

To help us better understand your usecase, please share details on how you use/plan to use the alternate titles data.

clcdpc

2 Messages

•

152 Points

9 years ago

Will there be any way of accessing the certificates.list file? It doesn't appear to be part of the current S3 bucket.

sv_6654070

15 Messages

•

820 Points

Certificates data is not part of the S3 dataset. Could you please share details on how you use this data?

9 years ago

clcdpc

2 Messages

•

152 Points

We're using that information in a tool that helps public libraries better classify their material when they make it available for the public to use. This information helps streamline our workflows and also allows the parents we serve to make more informed decisions.

9 years ago

jw_51j654boziha0

1 Message

•

120 Points

9 years ago

You must be joking. Some sort of convoluted user-registration and user-pays is somehow going to be better than ftp? Seems very stupid and totally unnecessary if you ask me. And AWS would have to be the most convoluted and non-user-friendly system anyone has ever invented.

marcel_korpel

5 Messages

•

210 Points

9 years ago

Thinking about this for a day, now.

So, in short, it seems that there is less data (that is easier to parse, I assume) and you actually have to pay and jump through hoops setting up (an) account(s)* to an AWS and S3 to be able to access the bulk data using an API I have to learn using. If I am correct, you have to pay for the bandwidth, so are there even diff files provided to lessen that burden, like on the FTP sites?

All in all, I am sad to say this doesn't sound as an improvement to me.

* Using a credit card, which is not that common in several countries in the world (in the Netherlands, for instance, debit cards are far more common)

sv_6654070

15 Messages

•

820 Points

Accessing the S3 buckets will require an AWS account and an AWS SDK or the Command Line Interface. We have provided a couple of sample codes for accessing the S3 bucket using the AWS Java SDK - http://www.imdb.com/interfaces.

If a simpler GUI-based access is required, there are some third-party S3 clients that provide that as well.

With the requester-pays S3 bucket, users pay for the data transfer and network cost. That said, the files hosted on the S3 bucket are gzipped and the largest dataset as of today is 151MB.

As part of the AWS Free Usage Tier, currently, new AWS customers receive 5 GB of Amazon S3 storage in the Standard Storage class, 20,000 Get Requests, 2,000 Put Requests, and 15 GB of data transfer out each month for one year. So if your request pattern falls within those limits, you are unlikely to incur any data transfer or request costs for the first year. After that according to Amazon S3 pricing chart the data transfer (to Internet) of the first 1 GB / month is Free and $0.01 per 10,000 requests (standard GET request).

Hope this helps address your concerns.

9 years ago

gardner_von_holt

16 Messages

•

792 Points

9 years ago

My use case is similar to some above. I have perhaps twice to three times a year refreshed a series of about 8 files (actor, actress, movie, director, aka titles, genres, mpaa, and taglines), so that I would have compete cast, title, directory, genres for my personal movie collection.

While I accept that things change over the years, the main thing that I would desire to have restored is complete actor and actress credits for future movies.

I have never used the data for anything other than my personal movie collection, and despite being a software developer have never offered this software publicly to date.

I have invested significantly over the years in developing my internal movie database oriented software to enhance my enjoyment of movies, and have likewise for many years bought most of my movies from amazon.com, .de, or .co.uk. If I gave in on this I would be reducing years of development of my movie tracking to wasted effort.

You can measure my length of use of the data and imdb to my userid (gvh), one I could no longer acquire today, and that I have had an account with amazon for about 20 years.

Additionally, I believe Amazon made representations when you purchased IMDB to continue to make it available to the public, and have for many many years provided this data for non-commercial purposes. I have invested very significantly in this software and am frustrated to see access withdrawn to data that has always been there.

What would make me happy? Any of the following:

* Ability to download full data for movies you purchased from any amazon business, or view on amazon prime movies (I get data for those movies I pay for). Im totally ok with limiting my access to those movies I have a commercial relationship with amazon about.

* Some API (rate limited would be ok) to return the full cast for an individual movie
This would allow me to add new entries for newly purchased movies, a vast majority of which I buy from amazon. And limiting me to a handful of queries a day or week would be acceptable. (I get 5 queries a day plus 5 queries for each movie I purchase at amazon, for example)

* Full data limited to recently released movies and dvds (so that when I buy a movie I can add new data). I rarely purchase new movies that are back catalog. Almost everything is purchased when the dvd goes open for sale, so any clever way of limiting me to newly released movies would solve much of this need.

* Ill even pay a token amount per set of queries if you want some way to guard against being spammed with accounts. Or you can charge my AWS account if thats possible.

S3 as a data source, and tabbed text is fine for me, as would any sort of API or method you could offer, Ill write whatever software is necessary to access the data, and as I said, this is only necessary for new titles, so I'm totally ok with rate limiting my access (given that there is some way to test and develop with sample data or something in a non-rate limited way)

Thanks, and I can be reached at the email associated with this account for further discussion

Vincent_Fournols

2.4K Messages

•

81.2K Points

I am in a very similar situation compared to the previous use cases presented above: more than 20 years contribution to the IMDb under "fourvin", use of twice yearly IMDb download data to update my personal DB through a self-developped interface in VBA (I cannot invest in new languages and technologies anymore).

But no way to restrict it only to recent releases and DVD: I watch a lot of movies from on-demand catalogues, whatever year they were created, and I wish to get the full data for the targeted films or people around the movies I have seen.

I also feel cheated to having contributed freely to the IMDb and see the access to this data (e.g. aka-titles) made unvailable.

9 years ago

andrew_gallant

6 Messages

•

212 Points

9 years ago

Thanks for doing this! Parsing TSV files will be much easier than parsing the old format.

I did run across one small technical issue. In `title.basics.tsv`, there are a few records that appear to be malformed. For example:

tt2222222	tvEpisode	"Hollywood Regency Meets Country Club Chic	"Hollywood Regency Meets Country Club Chic	0	2011	\N	\N	Reality-TV

In this record, the `primaryTitle` and `originalTitle` fields appear to begin with a double quote, but there is no corresponding closing quote. The actual name of the title does start with a double quote, so I think the correct format would be:

tt2222222	tvEpisode	"""Hollywood Regency Meets Country Club Chic" """Hollywood Regency Meets Country Club Chic" 0	2011	\N	\N	Reality-TV

Since the CSV/TSV format escapes quotes by doubling them.

Thanks!

sv_6654070

15 Messages

•

820 Points

Thanks for reporting the issue. We are looking into it.

9 years ago

ron3

421 Messages

•

9.9K Points

9 years ago

I am not a "customer of the IMDb bulk data", but as a contributor, I do download the raw data sets on occasion in order to parse through them and submit corrections for obvious errors.

For example, mis-spellings in release date attributes. I take it this data is disappearing forever, given the very limited imdb-datasets items shown on the interfaces page.

Perhaps it might be better to list what data is remaining available, and what data is going away?
Thanks.

marcel_korpel

5 Messages

•

210 Points

I agree, the old datasets contained way more data. E.g., I'd like to do statistical analyses with technical data, let's say aspect ratio used through the years; or list movies with alternative runtimes, or several uses of color (b/w and color in one movie).

There were so many other (statistical) use cases I had with the bulk data and it's sad that those are no longer possible.

9 years ago

Col_Needham

Employee

•

8.5K Messages

•

194.9K Points

For the release date attribute (and similar cases), the attribute field browser inside the submissions interface may help.

All the live attributes can be browsed and filtered via https://contribute.imdb.com/updates/field/release_dates/attr

9 years ago

ron3

421 Messages

•

9.9K Points

Yes, that is where the issues are first noticed (attribute field browser). Without the raw data being available, how does one find which film any attribute is attached to? I looked through the Advanced Title Search, but didn't find a way to search the attributes. I can use Google (or similar) to try and find them. Hopefully there's a way (I haven't found yet) to continue using this data to make IMDb better. Thanks.

Here are some examples of Los Angeles release date attributes that I couldn't fix because Google couldn't find them.

34     Los Angeles, CA
6     Los Angeles, Ca
1     Los Angeles,California

(they should all be 'Los Angeles, California')

9 years ago

gp_hm83t2lf4wkw8

1 Message

•

80 Points

9 years ago

Just checked out AWS. As long as there is no "bill capping" feature implemented (which is -funny as it is- requested by a lot of people for almost a decade), there is no way to set your bill to a hard limit.
Admittedly, most cases of hacked accounts were the result of unintentionally published private keys on github. Still, for me, using such a service would give me sleepless nights, as I could never be sure not to wake up with a multi-thousand dollar debt.
This attitude is irresponsible from Amazon, and also means, that IMDB has no longer a reasonable public available data set.

nobody_7029854

758 Messages

•

29.6K Points

... there is no "bill capping" feature ....

Interestingly, that issue is mentioned in Wikipedia's article about S3.
According to that article:

"... AWS does not provide native bandwidth limiting and as such
users have no access to automated cost control. This can lead to
users on the 'free-tier' S3 or small hobby users amassing dramatic bills. ..."

9 years ago

sv_6654070

IMDb Data – Now available in Amazon S3

Col_Needham

Vincent_Fournols

jeorj_euler

gardner_von_holt

valen_dbhyhk9css1a7

nick_2lqi1n2rpiwpu

ron3

sv_6654070

gardner_von_holt

jeorj_euler

jeorj_euler

gardner_von_holt

jeorj_euler

chris_h_7i63q04tc55pm

Vincent_Fournols

chris_h_7i63q04tc55pm

owenrees

manfred_polak

lucacanali

dgranger

sv_6654070

dgranger

sv_6654070

matisszz

ddb_hsnrsxjeb6gha

sv_6654070

ddb_hsnrsxjeb6gha

matisszz

sv_6654070

clcdpc

sv_6654070

clcdpc

jw_51j654boziha0

marcel_korpel

sv_6654070

gardner_von_holt

Vincent_Fournols

andrew_gallant

sv_6654070

ron3

marcel_korpel

Col_Needham

ron3

gp_hm83t2lf4wkw8

nobody_7029854

Related Conversations

Helpful Widget