keyword_expert's profile

2.6K Messages

 • 

45.3K Points

Friday, September 2nd, 2022 3:57 AM

Closed

Solved

IMDb Staff: Miscellaneous keyword requests

Dear IMDb Staff:

I have a couple miscellaneous keyword requests:

1.  Please block this keyword:  reference-to-poe 

I just finished auditing the inherently vague keyword "reference-to-poe." Although it was assigned to only 2 titles, one of the instances should have been reference-to-edgar-allan-poe and the other reference-to-poe-dameron. That split proves that the keyword reference-to-poe is inherently vague. It should be blocked from adding in the future.

2. Please unblock the keyword "donald-trump" and set it up for auto-conversion to "reference-to-donald-trump":

donald-trump (0 titles)  -->  reference-to-donald-trump (4644 titles)

A while back, at @Marco's request, the keyword "donald-trump" was simply blocked from the system. While there was nothing inherently wrong with blocking the keyword, it does mean that in the future, when people try to add that keyword, they will be blocked from doing so, and some of them will give up without realizing they could be using the more appropriate keyword "reference-to-donald-trump." The less drastic remedy to blocking "donald-trump" would be to set that keyword up for auto-conversion to "reference-to-donald-trump." Similar keywords like "franklin-d.-roosevelt" and "vladimir-putin" have already been set up for auto-conversion to their "reference-to-" counterparts. The same should be done with the "donald-trump" keyword.

3. Please undo and replace the current auto-conversions for "television-news-reporter" and "television-news-report":

When IMDb staff acted on @adrian's request here, a glitch was inadvertently created in the auto-conversions, so that when you try to add the keyword "television-news-reporter" the system auto-converts to "-tv-news-reporter," and when you try to add "television-news-report," it auto-converts to "-tv-news."  Note the hyphens (-) at the beginning of these keywords. The hyphens were included as typos when these auto-conversions were set up.

The solution is to merge and replace the auto-conversions as follows (in the direction of the arrows):

Proposed Replacement Auto-Conversions

television-news-reporter (0 titles) -->  television-reporter (590 titles)

television-news-report (0 titles)  -->  tv-news (2231 titles)  

And along the same lines, these mass mergers and auto-conversions should also be made:

Duplicate Keywords Proposed for Permanent Merging and Auto-Conversion

-television-news-reporter (1 title)  -->  tv-news-reporter (99 titles)  -->  television-reporter (590 titles)

More details explained in my comment here

Thank you in advance to IMDb staff for resolving these keyword issues. 

Accepted Solution

Bethanny

Employee

 • 

1.9K Messages

 • 

20.4K Points

5 months ago

Hi keyword_expert-

All fixed.

Cheers!

2.6K Messages

 • 

45.3K Points

@Bethanny

I double-checked the full list, and it looks to me like one particular merger, shown below, still needs to be done. The auto-conversion has been set up, but not the merger.

 tv-news-reporter (99 titles)  -->  television-reporter (590 titles)

Bethanny

Employee

 • 

1.9K Messages

 • 

20.4K Points

@keyword_expert​ Hi!

Merged.

Cheers!

Champion

 • 

2.7K Messages

 • 

69.2K Points

5 months ago

Wouldn't it be more consistent to merge television-reporter and tv-news-reporter into tv-reporter similar to how television-news was merged into tv-news?

2.6K Messages

 • 

45.3K Points

@adrian​ That question has been discussed a lot over the past year or more. After community discussion, the way these keywords generally went was to use "tv" when referring to actual content (e.g., "tv-news," "tv-program," "tv-show," "tv-advertisement"), while using "television" in most other instances (e.g., "television-broadcast," "television-presenter," "television-set," "television-station"), except for terms that end in "tv" (e.g., "local-tv," "watching-tv," "live-tv").

More details here, here, and here.

(edited)

Champion

 • 

2.7K Messages

 • 

69.2K Points

@keyword_expert​ 

But this is what makes contributing frustrating. There is no rhyme or reason for using "tv" in some places and "television" in others. It really needs to be one or the other. Either you use the full word or you use the abbreviation.

2.6K Messages

 • 

45.3K Points

@adrian

There actually has been rhyme and reason to the process and community discussion that led up to this. As you can see from the original numbers, a lot of keywords like "television" itself and "television-broadcast" and "television-presenter" and "television-set" were already a lot more popular than their "tv" counterparts. So the community had already spoken in custom and practice on some of these keywords.

Plus there is the distinction between actual "tv" content and other "television" words. The actual word is "television"; the word "tv" is just an abbreviated shorthand. IMDb uses "tv" when referring to "TV Shows" for example, so using the abbreviation makes the most sense when referring to content.

Although I agree that consistency across keywords is of course an important factor, it's not the only factor, and sometimes it can be outweighed by other factors. A similar example is with the "u.s." vs. "american" keywords (e.g., "american-politics" and "u.s.-government").  Sometimes the world is not so simple to allow for a black-and-white rule. 

(edited)

Champion

 • 

2.7K Messages

 • 

69.2K Points

@keyword_expert​ 

Popularity should not be the deciding factor. A lot of popularity would result from which was added first and how they show up in the search. Consistency should be what we strive for, not some popular notion. Also, "american" key words should be banned because they are U.S. centric and ignore the rest of the Americas.

The world may not be simple, but rules for adding to a database should be, especially when we can control how they happen and what gets submitted. As someone who does a lot of verification for a living, complicating things for no real stated goal doesn't make sense.

2.6K Messages

 • 

45.3K Points

@adrian​ I completely agree with you: it would be dumb to make a rule that popularity should always be the "deciding" factor. As I have said many times, there are often multiple factors for any particular keyword merger question, and all relevant factors have to be taken into account. Sometimes multiple factors can outweigh others. There are many other potential factors other than just prevalence on IMDb. Some of them include dictionary spellings and meanings of words, prevalence on the Internet as a whole, IMDb's preference for various things such as American English (there's that word "America" again), consistency across sets of keywords (apparently your favorite factor?), proper formatting, avoiding vague and subjective keywords, and just plain common sense.

As for popularity of a particular keyword, that can often be a relevant factor, and when there is overwhelming popularity of a keyword, that factor can be given even more weight (unless it's clear that one or two contributors have been "curating" in favor of that keyword over many years). I am usually reluctant to "undo" very popular keywords -- sometimes there are reason and logic behind a popular keyword that might not be immediately clear to me, but is clear to others.

Much of the world refers to the U.S.A. as "America" and people who are citizens of that country as "Americans." And you make a good point that the Americas can be relevant, too. That is one reason why "american" keywords should sometimes be used rather than "u.s." keywords: because the "u.s." keywords are the ones that are "U.S.-centric," not the other way around.

A good example is the keyword american-history (1277 titles). This is preferred over "u.s.-history" because it captures the time from prior to 1776, before the United States of America was a country (and from before the term "u.s." would have been used).

In most cases "u.s." keywords are preferred. But "u.s." often implies the government itself, so sometimes "america" and "american" keywords make more sense.

As in life, rules can have exceptions, even on IMDb. 

rootsmusic

237 Messages

 • 

5.2K Points

5 months ago

I'd like to see IMDb cleanup keywords with the prefix based-on-, especially based-on-a- .  These keywords should be merged with those with the prefixes adapted-from- and adaptation-of- (including variants like book-to-movie-adaptation and literary-adaptation).  I also don't think that the material that's being based-on need to be so subdivided (e.g. novella, novelette, short novel, light novel).

(edited)

2.6K Messages

 • 

45.3K Points

@rootsmusic​ There is a lot to unpack there.

First, I don't see very many keywords left that literally begin with "based-on-a-" if that's what you meant to say.

Second, one of my rules of thumb is to hold off on proposing mass mergers until a keyword (or set of keywords) has at least 50 titles currently assigned to it. I'm not seeing that with very many of the "adapted" and "adaptation" keywords.

Third, the keyword "literary-adaption" is difficult to address. It would be great if it could be merged into a "based-on-literature" keyword, but there is no such keyword. There is a keyword called "based-on-literary." I suppose both of those keywords could be merged into a new keyword "based-on-literature." I will add that to my list for future public proposals. But because "literature" is very broad and includes novels, poetry, and even things like essays, it would not make sense to merge these keywords into anything more narrow than "literature" keywords.

I agree that perhaps some of the keywords you listed could be merged. But I would merge all of the various synonyms for short novels into one single preferred term for that type of literature. The word "novella" seems to be the most popular term in the existing keywords. I would not get rid of the "novella" keywords, since someone (probably multiple people) took the time to distinguish on that basis over the years.

2.6K Messages

 • 

45.3K Points

@rootsmusic

p.s. The keyword "based-on-light-novel" has a specific meaning. A light novel is defined on Wikipedia as " a style of Japanese young adult novel primarily targeting high school and middle school students." It may be possible to merge "based-on-light-novel" into "based-on-young-adult-novel," but I am not so sure that makes sense. In general I would tread lightly here.

based-on-light-novel (53 titles)

based-on-young-adult-novel (87 titles)

rootsmusic

237 Messages

 • 

5.2K Points

@keyword_expert​  Your (p.s.) caution is warranted.  But in contrast to your "rule of thumb", I'm complaining broadly about keyword variants that are too specific.  Instead of uniquely describing rare titles (< 2 titles), a keyword variant would be too specific if used rarely (< 10 titles) by contributors because there are more popular keywords (> 50 titles) that can be substituted.  In other words: I think there are too many keyword variants, which make it more arduous to include every variant in searching with keywords.

2.6K Messages

 • 

45.3K Points

I generally agree with your thoughts on highly specific keywords. But specific keywords and rare keywords are not necessarily the same thing. A keyword has to start somewhere, and just because it is rare right now does not mean it will always be rare.

When we see duplicate keywords where one of them has less than 10 titles, contributors can always manually merge such keywords ourselves. I do that kind of thing all the time. But when the numbers get too high, that's when it is time to ask staff for help.

 

One of my biggest pet peeves on IMDb is what I call "orphaned" keywords. I use that term for keywords that only have 1 title. Such titles are virtually worthless because they can't be sorted in keyword searches for patterns across titles.

The absolute worst is orphaned keywords that will always be orphaned, stuff like "middle-aged-woman-pours-a-cup-of-chamomile-tea-on-the-head-of-a-teenage-boy." This kind of stuff should never be keywords. Yet people persist in creating them. 

When creating a brand new keyword, if the contributor can't think of a second title that the keyword applies to, they should probably think long and hard about creating it. 

2.6K Messages

 • 

45.3K Points