brian_willoughby's profile

1 Message

 • 

100 Points

Sunday, August 9th, 2020 7:59 PM

4

database report counting unique Actors, Directors, et cetera

I found the database statistics page, but I'm looking for information that would require a database query and report.

For example, of the ten million names on IMDb, how many have one or more Director credits? How many have one or more Actor credits?

I see 27 million Actor records, 17 million Actress records, 5 million Director records... but I assume that these count the same person multiple times. That has to be true, with 27 million actor records and only 10 million names. So, I think a lot of people would like to know how many unique Actors there are, without counting them multiple times for multiple movie appearances.

If there's already a page with these sorts of summary reports, please let me know! Otherwise, I think it would be good promotional material to highlight somewhere on IMDb.com (above the raw Statistics, which might be a bit too much detail for most people).

Champion

 • 

1.1K Messages

 • 

50.9K Points

4 years ago

Hi, Brian:

I agree that the  statistics page ideally should give a lot more information, mainly for there is many more interesting results to be reported, but also to put in perspective the true (large) magnitude of the database. I hope developers improve this soon (...or eventually at least ).

In the meantime, you can gather summary reports yourself by looking at the plain text datasets. They contain a small subset of variables only, but sufficient to get listings like the one you're asking for.

I recently processed name.filmography.tsv (see below), so I can offer you some quite up-to-date numbers.
From time to time here at GS we come across threads complaining about the way counts the credits of a person to determine their primary profession; there are some rules (e.g. multiple jobs are weighted, unreleased episodes are treated as if they aired the earliest series' year whereas unreleased films are treated as the latest title for the person, etc.), but the most controversial one is that each episode of a same television series counts separately. Out of curiosity, last weekend I checked what would happen and how many names would be affected if would count credits by using the numbers that are actually displayed in their name pages (i.e. a same series adds just "1" regardless the number of episodes the person worked in) and not by the current system. I wanted to answer questions like "how many primary professions would change?", "which would be the most frequent modification?", "would the alternative counting "upgrade" primary positions (e.g. from "casting department" to "casting director") more than "downgrade"?",  "would the alternative counting allow for a better representation of the latest works of a person?", and so on...)
As of 07 Aug 2020:

TOTAL NAMES: 9,658,245 names
TOTAL CREDITS:
          By CURRENT COUNTING SYSTEM (episodes count separately): 132,493,587 credits
                         If multiple jobs in the same category for a same title are weighted: 133,358,152 credits
          By ALTERNATIVE COUNTING SYSTEM (episodes are consolidated): 50,655,246 credits
                         If multiple jobs in the same category for a same title are weighted: 51,509,562 credits



Cheers!