4 Messages

 • 

214 Points

Friday, September 9th, 2022

Closed

Solved

HTTP Headers for IMDb datasets

The HTTP headers for the title.ratings.tsv.gz return different dates for the "Date" and "Last-Modified". It appears "Date" is the correct header based on when new files are available, but I'm not sure. Can you confirm "Date" is the right date to use for when the file was created? If so, does "Last-Modified" have any meaning?

Oldest First
Selected Oldest First

Accepted Solution

Employee

 • 

5.6K Messages

 • 

58.9K Points

4 years ago

Hi @burkasaurusrex -

We received an answer from the team in charge:


Date: Current date and time.

Last-Modified: Object creation date or the last modified date, whichever is the latest.

The Last-Modified is right field to use to understand when our dataset’s get updated.

Cheers!

4 Messages

 • 

214 Points

4 years ago

Following up here after a bit more investigation. "Last-Modified" changes with the "ETag", while "Date" seems to change somewhat randomly every few hours even when "ETag" does not change. Both behaviors are more of what I'd expect (maybe the CDN is returning the "Date" it cached the result or something while "Last-Modified" is returning file information).

However, when the "ETag" changes, the new "Last-Modified" date points to ~20hrs earlier. I would expect a date closer to the actual "ETag" change. So is the "Last-Modified" date when the information was pulled actually from IMDb's database? Or is it erroneous?

Employee

 • 

5.6K Messages

 • 

58.9K Points

4 years ago

Hi burkasaurusrex-

I have made the question to the team in charge. As soon as I have an answer I will give you an update here.

Cheers!

4 Messages

 • 

214 Points

4 years ago

Thanks @Bethanny! I started doing a HEAD request every hour or so. In case it's helpful to the team, here's the output of unique responses I've observed over the last few days in case it's helpful (all dates UTC):

FirstObserved HttpDate HttpLastModified HttpETag
9/9/2022 11:20:18 PM 9/9/2022 3:25:17 AM 9/8/2022 1:21:16 PM d916d11f47326c729245a7895cadc67f
9/10/2022 3:30:13 AM 9/10/2022 3:30:13 AM 9/8/2022 1:21:16 PM d916d11f47326c729245a7895cadc67f
9/10/2022 9:30:14 AM 9/10/2022 9:30:16 AM 9/9/2022 1:23:11 PM ba3b47ed33d9fe5841507bb75036187b
9/11/2022 10:30:17 AM 9/11/2022 10:14:27 AM 9/10/2022 1:20:43 PM b337cf110fe5afa650fddc207fe893ae
9/12/2022 10:30:17 AM 9/12/2022 10:30:19 AM 9/11/2022 1:21:22 PM 96fc96d22a481783c93c302a3809bab3
9/13/2022 11:30:17 AM 9/13/2022 11:30:19 AM 9/12/2022 1:21:26 PM 82fdd659c69ea658f5c6fb7a4d2f5a0a

(edited)