S

143 Messages

 • 

1.7K Points

Wednesday, July 6th, 2022 7:13 PM

Solved

Duplicate and Redundant Keywords to be deleted or merged

Thought I'd uncover some duplicates across many different thematic topics and scopes. 

Dystopian Sci-Fi: 153 <- Dystopic Sci-Fi 3

Dystopian-Future: 153 <- Dystopic-future: 19

Dystopia: 1395 <- Dystopy: 3 (looks like a spelling mistake)

Post-Apocalypse: 1598 <- Postapocalypse: 5

Alternate History: 864 <- Alternative History: 16 (I'm British, but I assume the site defaults to American spelling)

Pandemic: 22570 <- Pandemic-film: 4 (Film is already filterable, so this seems redundant) <- Pandemic-short-film: 3 (Short is already filterable, so this seems redundant) <- Pandemic-umentary: 2 (Documentaries can also be filtered for) <- Pandemic-documentary: 1 (See above) <- Pandemic-pal: 1 (This is just the partial title of the film)

Time-travel: 3150 <- Time-travel film: 1

Future-time-travel: 79 <- time-travel-into-the-future: 30

Forwards-time-travel: 31 <- Fowards-time-travel: 25

Time-traveler: 455 <- Time-travelers: 9 (plurals not acceptable according to guidelines)

Time-loop: 364 <- Time-loop-movie: 1 <- timeloop: 4

trapped-in-a-time-loop: 65 <- stuck-in-a-time-loop: 21 <- caught-in-a-time-loop: 4

Prison: 8322 <- Prison-film: 5 <- Prison-show: 5

Prison-visit: 760 <- Prison-visitation: 85 <- visit-in-prison: 32 <- prison-visitor: 6

Prisoner-of-war: 819 <- war-prisoner: 8

Prison-warden: 433 <- prison-warder: 8

escaped-prisoner: 521 <- prison-escapee: 78 <- prisoner-on-the-run: 12

prison-sex: 75 <- sex-in-prison: 21 <- sex-in-a-prison: 17

Prison-break: 523 <- Prison-breakout: 5

Prison-torture: 10 <- tortured-prisoner: 6 <- tortured-prisoners: 5

prisoner-revolt: 13 <- prison-revolt: 5

prison-murder: 21 <- murder-in-prison: 15 <- murdered-in-prison: 14

Prison-release: 74 <- Prisoner-release: 17 <- released-prisoner: 12

prisoner-escort: 20 <- escorting-a-prisoner: 7

Zombie: 4223 <- Zombie-movie: 32 <- Zombies-movie: 7

Zombie-sex: 14 <- sex-with-a-zombie: 32

zombie-child: 99 <- child-zombie: 5

Zombie-horde: 25 <- hordes-of-zombies: 8

turned-into-zombie: 14 <- turning-into-a-zombie: 6

Killing-a-zombie: 23 <- Zombie-kill: 2

Bitten-by-a-zombie: 81 <- Zombie-bite: 59

Vampire: 3386

Vampire-hunter: 217 <- vampire-slayer: 149 <- vampire-hunt: 11 <- vampire-hunting: 5 <- vampire-hunters: 3 (Perhaps you might regard "hunter" as different to "slayer, I don't know)

vampira: 1 (just the title of the character in the film)

child-vampire: 81 <- vampire-child: 6

female-vampire: 420 <- sexy-female-vampire: 30

vampire-sex: 37 <- sex-vampire: 13 <- sex-with-a-vampire: 15

vampire-fangs: 37 <- vampire-fang: 2

vampire-boy: 7 <- boy-vampire: 4

vampire-family: 14 <- family-of-vampires: 4

Werewolf: 1231

Werewolf-transformation: 54 <- turned-into-werewolf: 3 <- turning-into-werewolf: 5

Police: 19555 <- police-film: 7 <- police-show: 7

male-police-officer: 3130 <- policeman 4286 <- policemen: 4 <- police-man: 7

female-police-officer 4297 <- policewoman 560 <- police-woman: 23 <- female-cop: 28

police-dog: 250 <- police-dogs: 2

racist-police: 24 <- police-racism: 21

police-officer-killed: 1114 <- police-killed: 1

ex-cop: 384 <- ex-police: 10 <- former-cop: 6

cop-drama: 143 <- police-drama: 15

cop-killed-by-a-cop: 18 <- cop-murders-cop: 1 <- cop-kills-cop: 1

police-comedy: 6 <- cop-comedy: 6

police-officer-killed: 1114 <- cop-killed: 25 <- police-killed: 1

There's a lot of 'cop' and 'police' keywords that say the same thing. 

Friendship (I feel that "platonic" here is redundant, as it assumes that friendships generally are sexual in nature - when the actual term would be "friends with benefits")

male-female-friendship: 181 <- male-female-platonic-friendship: 10 <- opposite-sex-friendship: 1

male-friendship: 735 <- male-male-friendship: 39 <- platonic-male-friendship: 1

female-friendship: 587 <- female-female-platonic-friendship: 7

friendship-between-teens: 65 <- teenage-friendship: 7 <- teen-friendship: 2 <- friendship-between-teenagers: 5

friendship-between-teenage-boys: 2 <- teenage-boy-teenage-boy-friendship: 1

human-alien-friendship: 21 <- alien-human-friendship: 6

robot-human-friendship: 12 <- human-robot-friendship: 6 <- human-android-friendship: 15

interspecies-friendship: 83 <- inter-species-friendship: 2

Lesbian

lesbian-group-sex: 13 <- group-lesbian-sex: 7

lesbian-sex-on-a-couch: 7 <- lesbian-sex-on-couch: 5

lesbian-kiss: 7171 <- lesbian-kissing: 116

mistaken-for-lesbian: 27 <- mistaken-for-a-lesbian: 19

2.7K Messages

 • 

47K Points

2 years ago

When I have used arrows in my duplicate keyword lists, the mergers are proposed in the direction of the arrows.

For example, if I posted this:

interspecies-friendship (83 titles)  -->  inter-species-friendship (2 titles)

that would mean "interspecies-friendship" should be merged and auto-converted in favor of "inter-species-friendship," and the latter keyword would become the sole keyword.

It seems like you might be proposing the exact opposite: merging opposite the direction that each arrow is pointing. 

A couple other pointers: First, one of my rules of thumb (which I do sometimes break) is to only post lists like this if a keyword (or combination of keywords) to be merged is currently assigned to at least 50 titles. Anything less than 50 titles and I feel like I could be doing the mergers myself rather than using up valuable staff time to make these changes as mass mergers. (I have been waiting for months for staff to act on multiple lists, and they have told me that these lists do take a lot of time to sift through.)

And finally, a lot of the keywords on your list are not true duplicates. For example, "prison" and "prison-film" are not the same thing. One is a film focused on one or more prisons, while the other is a prison that just happens to appear in a film, possibly even in just one scene (without being the focus of the film).

With all that in mind, I will write another comment on this post compiling the keywords that I believe should be merged and auto-converted by staff.

2.7K Messages

 • 

47K Points

2 years ago

Here are the keywords from your list that I believe should be mass merged and permanently auto-converted.

(The first two rows (the time-travel keywords) were already on my internal list. I have a list with hundreds of keywords that I organize into different subject matters for future postings.)

time-travel-into-the-past (82 titles)  -->  backwards-time-travel (278 titles)

time-travel-into-the-future (30 titles)  -->  fowards-time-travel (25 titles)  -->  forwards-time-travel (31 titles)

prison-visitation (85 titles)  -->  visit-in-prison (32 titles)  -->  prison-visit (760 titles)

prison-escapee (78 titles)  -->  escaped-prisoner (521 titles)

prison-sex (75 titles)  -->  sex-in-prison (21 titles)  -->  sex-in-a-prison (17 titles)

prisoner-release (17 titles)  -->  prison-release (74 titles)  -->  released-from-prison (282 titles)  -->  release-from-prison (599 titles)

If I didn't include specific keywords in this list, that's either because they were less than 50 keywords (either by themselves or in combination with the other duplicate keywords) or because I disagree that they are duplicates.

I do believe that "future-time-travel" is not necessarily a synonym to those other keywords. As much as I would like this keyword to mean the same thing as "forwards-time-travel," I also recognize that in at least some instances it might mean time travel that takes place in the future. For example, it might be a movie made in today's time, with a plot that originates 1,000 years from now, where time travel takes place. In other words, it might be the "future of time travel" for some titles, while "time travel into the future" for most other titles.

Also, "zombie-sex" is not necessarily the same thing as "human-zombie-sex." The former might be two zombies having sex with each other.

Even "zombie-bite" is not necessarily the same as "bitten-by-a-zombie." This is a close call, but I regard "bitten-by-a-zombie" as a keyword for titles that show the zombie taking the bite, while "zombie-bite" might be best used for titles where the bite mark is shown on a person (oftentimes without showing the act of biting). 

The police officer keywords are an entirely different story. They were the focus of a massive manual editing campaign by @DataOrganizer and he or she spent so much time on it that he or she got burnt out and took a permanent break from IMDb. As a result of all of that, the police keywords are in a serious state of disarray, and there is some improper English grammar and poorly formed keywords, like "female-police-officer-knockout" and "male-police-officer-deceased" just to name a couple. I think these keywords deserve a separate, future post all on their own. That has been on my list of things to do eventually. Stay tuned....

Thank you for caring about duplicate keywords. I am always happy to discuss the best ways to merge duplicates.

(edited)

143 Messages

 • 

1.7K Points

I do believe that "future-time-travel" is not necessarily a synonym to those other keywords. As much as I would like this keyword to mean the same thing as "forwards-time-travel," I also recognize that in at least some instances it might mean time travel that takes place in the future. For example, it might be a movie made in today's time, with a plot that originates 1,000 years from now, where time travel takes place. In other words, it might be the "future of time travel" for some titles, while "time travel into the future" for most other titles.

I suppose that "future-time-travel" could mean something different from "forwards-time-travel" in some cases. But it's still a keyword that reads in an odd way to me. It's not obvious from reading it what the specific intent is, and in my opinion, that makes it a keyword that should be removed. But that is a separate discussion.

Also, "zombie-sex" is not necessarily the same thing as "human-zombie-sex." The former might be two zombies having sex with each other.

This is true, and it's so hilariously specific.

Even "zombie-bite" is not necessarily the same as "bitten-by-a-zombie." This is a close call, but I regard "bitten-by-a-zombie" as a keyword for titles that show the zombie taking the bite, while "zombie-bite" might be best used for titles where the bite mark is shown on a person (oftentimes without showing the act of biting). 

I think in this case you might be reading too much into the intent behind it being posted. If "zombie-bite" is specifically used to refer to an instance where a bite is uncovered on a person, as opposed to a bite happening, then it is simply not obvious that it might be applied that way.

"bitten by a zombie" vs. "revealed zombie bite" or "offscreen zombie bite" would have more clarity, and I would support "zombie-bite" being merged into that.

The police officer keywords are an entirely different story. They were the focus of a massive manual editing campaign by @DataOrganizer and he spent so much time on it that he got burnt out and took a permanent break from IMDb. As a result of all of that, the police keywords are in a serious state of disarray, and there is some improper English grammar and poorly formed keywords, like "female-police-officer-knockout" and "male-police-officer-deceased" just to name a couple. I think these keywords deserve a separate, future post all on their own. That has been on my list of things to do eventually. Stay tuned....

The problem with "police" and "cop" so far as I can see is that IMDB needs to choose between "cop" and "police" as nearly every "x-police" has an "x-cop" parallel. Once all they've all been merged together into one, it'll be somewhat easier to spot the duplicates.

"Police" seems to have more general usage, where as cop is a very Americentric term.

(edited)

2.7K Messages

 • 

47K Points

@Skavau​ 

I dislike the keyword "future-time-travel" as well. It's annoying and I agree it shouldn't exist.

I totally see your point about "zombie-bite." Like I said, this is a close call. But another one of my rules of thumb is, when in doubt, don't mass merge. Because mass mergers cannot ever be reversed.

I forgot to mention earlier that another one of my rules of thumb is to allow at least a 7-day comment period on proposed keyword mergers from fellow contributors before asking staff to act on a list of keywords.  Because again, these mass changes cannot be undone. Occasionally I have made mistakes in my proposals that are not caught before the merger goes through, and each time that happens I regret it. But it is a learning experience that hopefully leads to fewer mistakes in the future.

And yeah, "cop" versus "police" is a related problem that should eventually be dealt with. That was part of the work of @DataOrganizer.

143 Messages

 • 

1.7K Points

2 years ago

I have reversed the arrows to make it more inline with how you do it.

2.7K Messages

 • 

47K Points

2 years ago

"alien-invasion-sci-fi" and "space-opera-sci-fi" are listed as accepted "subgenre" keywords here:

https://help.imdb.com/article/contribution/titles/keywords/GXQ22G5Y72TH8MJ5?ref_=helpart_nav_30#

143 Messages

 • 

1.7K Points

@keyword_expert​ But we also have "space opera" and "alien-invasion" without "sci-fi". They are both by definition sci-fi without the suffix.

I don't get the logic there

(edited)

2.7K Messages

 • 

47K Points

@Skavau​ Not really. It is possible to have an alien invasion in a title (for example, as a brief part of a comedy TV episode) without dominating the plot so that it makes the entire title fit within the Sci-Fi genre.

143 Messages

 • 

1.7K Points

Well that's possible, but alien-invasion simply has much more usage than "alien-invasion-sci-fi".

And it makes even less sense for "space opera" - which is a long-established science fiction genre (without reference to the 'sci-fi' suffix). You can't just have "space opera" as a background reference in an episode. You either are or are not a space opera.

(edited)

2.7K Messages

 • 

47K Points

I would encourage you to post your criticisms of "space-opera-sci-fi," "alien-invasion-sci-fi," "prison-drama," etc. in this thread:

https://community-imdb.sprinklr.com/conversations/data-issues-policy-discussions/subgenre-keywords/61d477ea59e8f360a0ce4e18

143 Messages

 • 

1.7K Points

Looks like they did it to help new users, not for emphasis. So they wanted to encourage "alien invasion sci-fi" not because they want to distinguish between a film that has alien invasion as a small part of its narrative vs. a film chiefly about an alien invasion, but because they want new users to able to find the tag easier on the keyword search.

Really weird.

143 Messages

 • 

1.7K Points

@keyword_expert​ Problem is that it's just going to cause more tag confusion.

2.7K Messages

 • 

47K Points

2 years ago

"woman-detective" and "female-detective" are not duplicates, because "female-detective" is broader and could also include "girl-detective."

girl-detective (17 titles)

girl-detective-series (4 titles)

143 Messages

 • 

1.7K Points

@keyword_expert​ You know IMDB needs a hierarchical system, where if someone put in "girl-detective", the system would autoapply "female detective" as a parent tag.

This would solve a lot of problems.

2.7K Messages

 • 

47K Points

2 years ago

If it's okay with you, I will include the following keywords in a future post asking for IMDb staff to mass merge and auto-convert. I will give you credit for coming up with the keywords. And by the way I am adding "future-time-travel" in the list. I think we can take the risk that all instances of "future-time-travel" mean "forwards-time-travel," which is very likely to be the case.

time-travel-into-the-past (82 titles)  -->  backwards-time-travel (278 titles)

time-travel-into-the-future (30 titles)  -->  future-time-travel (79 titles)  -->  fowards-time-travel (25 titles)  -->  forwards-time-travel (31 titles)

prison-visitation (85 titles)  -->  visit-in-prison (32 titles)  -->  prison-visit (760 titles)

prison-escapee (78 titles)  -->  escaped-prisoner (521 titles)

prison-sex (75 titles)  -->  sex-in-prison (21 titles)  -->  sex-in-a-prison (17 titles)

prisoner-release (17 titles)  -->  prison-release (74 titles)  -->  released-from-prison (282 titles)  -->  release-from-prison (599 titles)

lesbian-kissing (116 titles)  -->  lesbian-kiss (7171 titles)

143 Messages

 • 

1.7K Points

@keyword_expert​ No problem

2.7K Messages

 • 

47K Points

@Skavau​ 

I am working on that post now.

Regarding "lesbian-kiss" and "lesbian-kissing," I suspect what many of these keywords actually mean is "woman-kisses-a-woman." One of my pet peeves is when people use "lesbian" in keywords when they could mean "bisexual." Just because two women kiss each other does not mean they are lesbians.

However, I don't think we can merge "lesbian-kiss" into "woman-kisses-a-woman," because a so-called "lesbian-kiss" could also be two girls kissing each other. 

Such are the complications of keyword merges.

(edited)

2.7K Messages

 • 

47K Points

2 years ago

What gets me is when a contributor is willing to spend an inordinate amount of their own time manually editing a keyword that occurs in great numbers on the site, when they could instead much more efficiently ask staff for a mass-scale, permanent solution.

Case in point:  A month ago, I asked for this permanent merger and auto-conversion:

violent (57 titles)  -->  violence (17893 titles)

Since then, somebody has manually edited the keyword "violent" down from 57 titles to only 13 titles:

violent (13 titles)

Why would somebody waste their time doing that, especially with my mass merger request pending?

This not only wastes the time of the contributor who does all those manual edits, but also hides the problem, because without an auto-conversion, "bad" but popular keywords like this will just continue to reappear in the future.

143 Messages

 • 

1.7K Points

@keyword_expert​ This is why I don't really distinguish between a keyword with 100 uses vs. 2 uses. It's still a duplicate and shutting that path down and merging it into another is still useful.

Honestly I think the answer is, if IMDB staff are too busy/disinterested in organising this is to empower you, me and others with keyword (just keyword) merge/deletion/block powers. Plenty of database sites use volunteers for things like this.

2.7K Messages

 • 

47K Points

@Skavau​ 

This is why I don't really distinguish between a keyword with 100 uses vs. 2 uses. It's still a duplicate and shutting that path down and merging it into another is still useful.

I still disagree with you on that. There is a major difference between a keyword with 100 uses vs. 2 uses. Based on the numbers alone, a "bad" keyword with 100 uses has already been demonstrated to be very popular, and therefore much more deserving of a permanent solution. I have seen from experience many popular keywords like this be mass-merged but not set up for auto-conversion, and then the keyword just gets created again soon thereafter. This message board is littered with old posts from years ago proving that this exact phenomenon happens with popular keywords. It is much less likely to happen with a keyword with only 2 uses.

Honestly I think the answer is, if IMDB staff are too busy/disinterested in organising this is to empower you, me and others with keyword (just keyword) merge/deletion/block powers. Plenty of database sites use volunteers for things like this.

Believe it or not, there has actually been some discussion between IMDb employees and volunteers (not on this message board) of setting up some kind of arrangement like this. I don't know if it will ever happen, though.

143 Messages

 • 

1.7K Points

@keyword_expert​ Well if we could just do it ourselves, we'd honestly soon clear through them through merging and deletion. It's only a priority to focus on more commonly used akas because 'staffs time is precious'.

2.7K Messages

 • 

47K Points

@Skavau​ True.