keyword_expert's profile

2.7K Messages

 • 

47K Points

Saturday, March 19th, 2022 12:15 AM

Closed

Solved

Duplicate Keywords - List #20 (Proposals for Permanent Merger and Auto-Conversion) (geography keywords)

Here is the next installment of my lists of proposed keywords for permanent merger and auto-conversion. 

I am posting this for fellow contributors to review first and raise any objections or questions. I will wait at least seven days before changing this post to a "problem" post and asking IMDb staff to make the proposed changes.

The mergers and auto-conversions should be made in the direction of the arrows.

Duplicate Keywords Proposed for Permanent Merging and Auto-Conversion

adelaide (216 titles)  -->  adelaide-south-australia (891 titles) -->  adelaide-australia (193 titles)  

aussie (811 titles) -->  australian (2123 titles)

baltimore (113 titles)  -->  baltimore-maryland (249 titles)

brisbane-queensland-australia (3 titles)  -->  brisbane-queensland (69 titles)  -->  brisbane-australia (16 titles)

brooklyn (72 titles)  -->  brooklyn-new-york (40 titles)  -->  brooklyn-new-york-city (870 titles)

buenos-aires (51 titles)  -->  buenos-aires-argentina (217 titles)

deutschland (151 titles)  -->  germany (2871 titles)

down-under (972 titles)  -->  australia (3355 titles)

european-women (60 titles)  -->   european-woman (2 titles)

harlem (48 titles)  -->  harlem-new-york (8 titles)  -->  harlem-manhattan-new-york-city (272 titles)

istanbul (146 titles)  -->   istanbul-turkey (413 titles)

kiev-ukraine (60 titles) -->  kyiv-ukraine (95 titles)

las-vegas (276 titles)  -->  las-vegas-nevada (1088 titles)

manchester (131 titles)  -->   manchester-england (257 titles)

manhattan (88 titles)  -->  manhattan-new-york (22 titles)  -->  manhattan-new-york-city (2848 titles)

melbourne (173 titles)  -->  melbourne-victoria (51 titles)  -->  melbourne-victoria-australia (7 titles)  -->  melbourne-australia (245 titles)

mexico-city-mexico (56 titles)  -->  mexico-city (269 titles)

miami (71 titles)  -->  miami-florida (695 titles)

miami-beach (62 titles)  -->   miami-beach-florida (38 titles)

montreal (123 titles)  -->  montreal-quebec (146 titles)  -->  montreal-canada (50 titles)  -->  montreal-quebec-canada (230 titles)

napoli-italy (45 titles)  -->  napoli-italia (1 title)   -->  napoli (57 titles)  -->   naples-italy (216 titles)

nashville (89 titles)  -->  nashville-tennessee (189 titles)

new-orleans (104 titles)  -->  new-orleans-louisiana (713 titles)

new-south-wales (177 titles)  -->   new-south-wales-australia (21 titles)

normandy (216 titles)  -->   normandy-france (39 titles)

nyc (66 titles)  -->  new-york-city (9598 titles)

philadelphia (76 titles)  -->  philadelphia-pennsylvania (554 titles)

rio-de-janeiro (165 titles)  -->  rio-de-janeiro-brazil (280 titles)

rome (171 titles)  -->   rome-italy (1100 titles)

sao-paulo (76 titles)  -->  sao-paulo-brazil (135 titles)

san-francisco (82 titles)  -->  san-francisco-california (1764 titles)

scandinavia-eastern-europe (119 titles)  -->  scandinavia-northern-europe (4 titles)  -->   scandinavia (33 titles)

st-louis (3 titles)  -->   st.-louis (45 titles) -->  st.-louis-missouri (198 titles)

sweden-eastern-europe (119 titles)  -->  sweden-northern-europe (4 titles)  -->   sweden (842 titles)

sydney-new-south-wales (52 titles)  -->  sydney-new-south-wales-australia (3 titles)  -->  sydney-south-wales-australia (4 titles)  -->  sydney-australia (510 titles)

toronto (194 titles)  -->  toronto-ontario (90 titles)  -->  toronto-canada (53 titles)  -->   toronto-ontario-canada (246 titles)

uk (133 titles)  -->   united-kingdom (692 titles)

ukrainian-russian-war (33 titles)  -->  russo-ukrainian-war (82 titles)

ukranian-woman (161 titles)  -->  ukrainian-woman (6 titles)

usa (2379 titles)  -->  u.s.a. (15 titles)  -->  united-states (630 titles)  -->   america (543 titles) -->   united-states-of-america (1579 titles)

ussr (255 titles)  -->  u.s.s.r. (4 titles)  -->  the-soviet-union (4 titles)  -->   soviet-union (1625 titles)

vancouver-canada (25 titles)  -->  vancouver-british-columbia (47 titles) -->  vancouver-british-columbia-canada (133 titles) 

district-of-columbia (10 titles)  -->  washington-d.c. (2511 titles)

Accepted Solution

Champion

 • 

14.4K Messages

 • 

330K Points

3 years ago

If you are suggesting brisbane-australia, melbourne-australia, sydney-australia, shouldn't you also use adelaide-australia rather than adelaide-south-australia?

2.7K Messages

 • 

47K Points

@Peter_pbn​ Good catch. I will make that change.

My original thinking was that the "south-australia" version is a lot more popular on IMDb, but I believe that popularity was skewed by a single contributor who has been adding keywords to Australian titles on a mass scale lately.

adelaide (216 titles)
adelaide-south-australia (891 titles)
adelaide-australia (193 titles)

Accepted Solution

Employee

 • 

5.6K Messages

 • 

58.7K Points

2 years ago

Hi @keyword_expert -

Merged and auto-converted!

Cheers! :)

2.7K Messages

 • 

47K Points

@Bethanny​ Wow, this is awesome! I feel like I won the lottery today, with all the multiple lists being processed. Thank you.

2.7K Messages

 • 

47K Points

@Bethanny​ 

"aussie" is not yet auto-converting to "australian."

"aussie" had been a popular keyword, so it has already been recreated within the past day.

aussie

Employee

 • 

5.6K Messages

 • 

58.7K Points

Found the error! Fixed :)

2.7K Messages

 • 

47K Points

@Bethanny​ Thank you!

1.3K Messages

 • 

23.4K Points

3 years ago

"indian" keywords needs to be audited.  Does it refer to "american-indian," "asian-indian," "indian-canadian," "first-nations," "south-american-indian," what?

(edited)

2.7K Messages

 • 

47K Points

3 years ago

@Michelle 

Now that the seven-day comment period has passed, I have converted this post to a "problem" post and it is ready for action by IMDb staff. The following keywords should be set up for merger and auto-conversion.

Duplicate Keywords Proposed for Permanent Merging and Auto-Conversion

adelaide  -->  adelaide-south-australia  -->  adelaide-australia 

aussie  -->  australian 

baltimore   -->  baltimore-maryland 

brisbane-queensland-australia   -->  brisbane-queensland   -->  brisbane-australia 

brooklyn   -->  brooklyn-new-york  -->  brooklyn-new-york-city 

buenos-aires  -->  buenos-aires-argentina 

deutschland   -->  germany 

down-under  -->  australia 

european-women   -->   european-woman 

harlem -->  harlem-new-york  -->  harlem-manhattan-new-york-city 

istanbul   -->   istanbul-turkey 

kiev-ukraine  -->  kyiv-ukraine 

las-vegas  -->  las-vegas-nevada 

manchester   -->   manchester-england 

manhattan   -->  manhattan-new-york   -->  manhattan-new-york-city 

melbourne   -->  melbourne-victoria  -->  melbourne-victoria-australia   -->  melbourne-australia 

mexico-city-mexico   -->  mexico-city 

miami  -->  miami-florida 

miami-beach   -->   miami-beach-florida 

montreal   -->  montreal-quebec   -->  montreal-canada  -->  montreal-quebec-canada 

napoli-italy   -->  napoli-italia   -->  napoli  -->   naples-italy 

nashville  -->  nashville-tennessee 

new-orleans  -->  new-orleans-louisiana 

new-south-wales  -->   new-south-wales-australia 

normandy   -->   normandy-france 

nyc   -->  new-york-city 

paris  -->   paris-france 

philadelphia  -->  philadelphia-pennsylvania 

rio-de-janeiro  -->  rio-de-janeiro-brazil 

rome   -->   rome-italy 

sao-paulo   -->  sao-paulo-brazil 

san-francisco   -->  san-francisco-california 

scandinavia-eastern-europe   -->  scandinavia-northern-europe  -->   scandinavia 

st-louis  -->   st.-louis  -->  st.-louis-missouri 

sweden-eastern-europe  -->  sweden-northern-europe  -->   sweden 

sydney-new-south-wales  -->  sydney-new-south-wales-australia  -->  sydney-south-wales-australia   -->  sydney-australia 

toronto   -->  toronto-ontario   -->  toronto-canada   -->   toronto-ontario-canada 

uk   -->   united-kingdom 

ukrainian-russian-war   -->  russo-ukrainian-war 

ukranian-woman  -->  ukrainian-woman 

usa   -->  u.s.a.  -->  united-states   -->   america  -->   united-states-of-america 

ussr  -->  u.s.s.r.   -->  the-soviet-union   -->   soviet-union 

vancouver-canada   -->  vancouver-british-columbia  -->  vancouver-british-columbia-canada 

district-of-columbia  -->  washington-d.c. 

(edited)

1.3K Messages

 • 

23.4K Points

3 years ago

Why are you acknowledging the provinces in Canada and NOT the states in Australia?  It seems that BOTH, logically and geographically,  should be noted in keywords.

Recently did an audit of the "bad" "paris" keyword, and it, once again, illustrated WHY keywords should be audited and not just merged.  Most, of course, are "paris-france," but there are also "paris-ottawa-canada" and, of course, "paris-texas" and "paris-character."

I even took the time to watch the "Wedding Belles: Part 2" episode of Walker, Texas Ranger (season 8, episode 25) to see if it needed the "paris-france" keyword of the "reference-to-paris-france" keyword.  At first, I thought it needed the "reference-to-paris-france" keyword since that city is mentioned a few times in dialogue, but, at the end, there is some stock footage of Paris as well as a reflection of the Eiffel Tower in a supposed Parisian hotel window, so it warranted the "paris-france" keyword.

I know that very few people will do such extensive audits, but they should be done to make the information on IMDb more factual and accurate.  I cringe at ALL mass mergers without audits because they may be destroying the truth.

P.S. There is also a "san-francisco-mexico."

The most difficult city when trying to ascertain its state is "kansas-city," since part of it is in Kansas and part of it is in Missouri.  (Very time consuming to discover the truth.)

(edited)

2.7K Messages

 • 

47K Points

@bradley_kent​ The naming conventions for these geographic keywords are largely based on user patterns over the years.  Generally speaking, contributors have favored including the provinces in Canada, and the boroughs of New York City, for example, but not the states in places like the USA and Australia.

I do "audit" these keywords before proposing them in these lists. Before posting "paris," for example, I determined as best as I possibly could that all of the instances of "paris" involved Paris, France, and not any other place named Paris.

You imply that the keyword "paris-ottawa-canada" was intended for some of the "paris," keywords, but it does not appear that you actually added "paris-ottawa-canada" to any titles. 

As we have discussed many times before, you and I have different understandings of what it means to "audit" a keyword. I use the word "audit" in its conventional sense -- here, double-checking the keyword to make sure it is ready for a mass change.

When you use the word "audit," you apparently mean manually editing every single instance of that keyword so that it is reduced down to zero. This is not an "audit," in my opinion, but rather a manual, mass-scale edit.

Manual, mass-scale edits involving dozens or even hundreds of instances of a keyword, performed by contributors rather than IMDb staff, can cause misleading patterns in keywords, making it appear that the patterns are coming from the community at large, when in fact they may be coming from the edits of one or two overly zealous contributors. 

Case in point: the overzealous and misleading mass changing of "cigarette" to "cigarette-smoking" over the years: Someone seems to be removing the keyword "cigarette" from the database

Also, as we have discussed, you and I have different understandings of when the "reference-to-" prefix should and should not be used. You describe your interpretations as absolute black and white rules or "truth" versus falsities, when in fact the guidelines are largely silent here, failing to define the limits of when the "reference-to-" prefix is proper, improper, required, not required, etc.

In my opinion, if Paris, France is an important plot point in a title, the "paris-france" keyword can and should be used, even if the title is not set or filmed in that city. But if Paris is mentioned by a character in passing, and the city is not an important plot point, it adds very little to include either a "character-says-" or "reference-to-" keyword for Paris, and if that were done for every little thing mentioned in the dialogue, it would quickly result in keyword spamming.

You are correct that very few people are willing to mass edit hundreds of instances of keywords on a mass scale. And for good reason: it is extremely inefficient and can actually be counterproductive for mass edits to be done by contributors, instead of being fixed on a mass scale by IMDb staff. It also results in "bad" keywords being used all over again by other contributors after they are "audited" (read: edited) by a single contributor. It is putting a band-aid on a systemic problem rather than performing the necessary surgery to fix it. If your approach of contributors manually editing every single instance of every single bad keyword down to zero had been followed, it would make it nearly impossible to achieve any real progress on a systemic level, and would depend on perpetual "auditing" (read: editing) by contributors in the future. 

(edited)

1.3K Messages

 • 

23.4K Points

Yes, we have very different interpretations of what an "audit" means.  To me, it is reviewing every single item.  Almost always, there are exceptions.  As was once again revealed in the Supreme Court hearings for Judge Ketanji Brown Jackson, one must look to each single case if one wants to know the truth.  Generalizations are not a true reporting of the facts.

Keyword audits are something that I have undertaken with the belief that the staff (probably underpaid AND understaffed) does not have the time to do.  I just wish that more contributors were willing to undertake this "grunt" work and not just command and suggest from high while the foot soldiers do the actual fighting.

Oh, and USA states have always been included with city names, just like you request with Las Vegas.

(edited)

2.7K Messages

 • 

47K Points

@bradley_kent​ There is a huge difference between checking every instance of a keyword and actually deleting every single instance of a keyword. That is the major difference between our two approaches.

You're right about USA states -- I was overgeneralizing and misspoke. What I should have said was that for USA, the country is not typically included, while for Australia, the state is not typically included (and both patterns have been established over time). 

1.3K Messages

 • 

23.4K Points

When I audit a keyword, I not only delete some, but add some and correct some.  And, an audit also provides an opportunity to add, delete and/or correct OTHER keywords.

Also, when I finish auditing, I usually request that the staff block the "bad" keyword from further addition.  Still waiting for some action on such requests before I request more.  (I have a list of quite a few requests for such blocks, but am waiting on some action from the staff on existing requests before doing so.)

The staff is much more responsive to your merges than it is to my requests.

Expediency, and a seemingly easy solution, seems to rule the day.

(edited)

2.7K Messages

 • 

47K Points

@bradley_kent​ I realize all of that. My point is that if, after an audit, it is determined that 95% of the instances of a particular keyword can be deleted, merged, changed, or some other action on a mass level by staff, then I will only worry about the 5% of instances that require some other change (e.g., changing to a different keyword). But it is often unnecessary (and in fact it can be counterproductive) to reduce all instances of a bad keyword down to zero.

Here are some examples of what I mean to "audit" a keyword: A few vague keywords that are ready to be merged and/or blocked (trump, roosevelt, washington, vancouver)

Note that for "trump" and "vancouver" in those examples, I didn't bother to reduce the numbers down to zero, which would have only been a complete waste of everybody's time. But for "roosevelt" and "washington," I did reduce those keywords down to zero, which was warranted under the circumstances.

The keyword "paris" is in the category of a keyword that makes absolutely no sense for contributors to reduce down to zero, because after I "audited" that keyword, I determined that 100.0% of the remaining instances involved Paris, France (and not any other place named Paris).

As for a manual "audit" of every single instance of a keyword turning up other edits that could be made, that would always be true of anything on the IMDb site. It is easy to get lost in the keywords by manually editing them on a mass scale. While a mass manual approach can be very appropriate for particular keywords (e.g., "roosevelt"), when that approach is applied to other keywords, it can be easy to lose the forest for the trees. 

(edited)

2.7K Messages

 • 

47K Points

@bradley_kent​ said:

Recently did an audit of the "bad" "paris" keyword, and it, once again, illustrated WHY keywords should be audited and not just merged.  Most, of course, are "paris-france," but there are also "paris-ottawa-canada" and, of course, "paris-texas" and "paris-character."

Are you claiming to have actually changed the keyword "paris" to "paris-ottawa-canada," "paris-texas," or "paris-character" for any title(s) during your "audit?" As far as I can tell, you did not do so. 

There were 0 titles with "paris-ottawa-canada" both before and after your "audit."

There were 4 titles with "paris-texas" both before and after your "audit."

There were 10 titles with "paris-character" both before and after your "audit."

Other than changing the keyword "paris" to "paris-france" for a bunch of titles (something that staff would have done anyway) and possibly changing "paris" to "reference-to-paris-france" for one or more titles, it is unclear what was achieved from your "audit," except to confirm my previous conclusion that 100% of the instances of "paris" were in fact intended to refer to "paris-france."

2.7K Messages

 • 

47K Points

3 years ago

I have edited this list to remove the "washington-d.c." keywords, now that they have been taken care of in response to @adrian's post here

(edited)

1.3K Messages

 • 

23.4K Points

3 years ago

There are many, many other cites that can be listed for these kinds of mergers -- almost every city that is listed as a keyword.  Most important is informing submitters of the correct way to do this.  (tbilisi-georgia, helsinki-finland, anyone?)

WARNINGS:  City names may often have a more common country listed, but there are many duplications of city names throughout the world, so one must be aware of the country/state/province/county,  etc.. (rome-georgia, paris-texas, manchester-new-hampshire, anyone?). There's a San Francisco in Mexico, a Vancouver in Washington state, a Naples in Florida.  Kansas City is in two states: Kansas AND Missouri.

History and the passage of time also affect this? Belgrade may be in Serbia, now, but it was is Yugoslavia.  Constantinople/Istanbul, Bombay/Mumbai, Saint Petersburg/Petrograd/Leningrad, etc.?  It must depend on HOW the city is referred to in each specific title.

Perhaps the most prolific city keyword that exists in MANY counties is "chinatown," so a very careful audit is needed here, as in many other instances.

Again, beware of faulty assumptions.  Beware of incorrect generalizations.  You can't just merge these city keywords and think the problem is solved. The specifics must be honored, or you are just further "messing up" the keyword database.

(edited)

2.7K Messages

 • 

47K Points

@bradley_kent

Well, this is interesting. Somebody (I'm assuming you) has made a bunch of manual edits to the "belgrade" keywords since I first posted this list:

belgrade (0 titles)  

belgrade-serbia (73 titles)

belgrade-yugoslavia (51 titles)

I am pretty sure the keyword "belgrade-yugoslavia" did not exist when I first posted this list, or else I would have taken it into account at that time.

Although you are correct that Belgrade was once part of Yugoslavia, it also remained a part of Serbia even during that same time, since Serbia was one of the six states or units of Yugoslavia. The idea behind the "belgrade" keywords on my list was to try to have one all-encompassing keyword that refers to Belgrade at all times in history. In theory, that keyword could be just "belgrade," but there would still be a bifurcation problem because some people would use "belgrade," while others would use "belgrade-serbia."

Another good example is cities in Ukraine. These cities have at various points in history been part of Kievan Rus', Russia (including the Tsardom of Russia and then the Russian Empire), the USSR, and Ukraine. Then it starts getting even more complicated considering the various romanized spellings of different languages for these cities. Should there be multiple sets of keywords with different spellings for each of these cities to signify different points in history (e.g., "kyiv-kievan-rus'," "kiev-tsardom-of-russia," "kiev-russian-empire," "kiev-ussr," and "kyiv-ukraine?" 

(Then it gets even more complicated when talking about cities like Sevastopol that are currently part of the disputed Crimean peninsula, which is regarded by the international community as still part of Ukraine but which Russia has annexed/occupied.)

Why go to all that trouble to bifurcate all these keywords, when single keywords like "kyiv-ukraine" will suffice and are accurate, since the geographical concept of Ukraine has been part of the mix for a long time even when it was part of other countries? 

It's the same with Belgrade. Even when it was part of Yugoslavia, it was also at the same time still part of Serbia. 

Interestingly, following your edits, 12 titles currently have both of the keywords "belgrade-serbia" and "belgrade-yugoslavia." 

Meanwhile, the keyword "belgrade" now has 0 titles, but it is not blocked or merged into any other keyword, so unless something is done with this keyword, it will inevitably be started up again. This is what happens when large numbers of specific keywords are privately, manually (and temporarily) edited by a single contributor, rather than permanently modified  by staff following a public discussion. 

I do not know what the solution is, but in this particular area I would prefer consolidating into as few keywords as possible.

Perhaps for cities like this it makes the most sense to make an exception and not use the countries as part of the keywords, keeping it simple with single keywords like "belgrade," "kyiv," "chernobyl," "crimea," etc.

1.3K Messages

 • 

23.4K Points

Your assumption is wrong.  It was not me, but I am curious about who it was -- and would like to commend them on the attempt.

Instead of having ONE generalized "city" keyword for ANY city, I, as stated above, go by how the city is acknowledged within a specific title.  Through wars and other changes, I go by the specifics. "berlin-germany," "east-berlin-germany" and/or "west-berlin-germany," for examples, can exist alone or in any combination, again, depending on how there are addressed within a specific title.  

Many cities have changed names (and even spellings) throughout history, and some cities have even disappeared, so their name "as it is addressed in a specific title" should be the determinate.  Anyone researching a cities evolution would need to search ALL its past names.  

Look at the Wikipedia "list of city name changes."  To try to convert each to only one name is a disservice to historical truth.

I live in New York City, or is it New Amsterdam?  Maybe.  Perhaps. But... don't think I was alive back then.    

 

(edited)

2.7K Messages

 • 

47K Points

@bradley_kent​ 

I think you mean "east-berlin-east-germany" and "west-berlin-west-germany." Obviously these are different cities than Berlin, and they each deserve their own keywords.

Same with cities that change names over time, like your New Amsterdam example.

But those are different from when a city retains the same name and geographical unit, but that geographical unit might temporarily become part of a different country and/or the romanized spelling might change. The examples I have given are Belgrade plus a number of Ukrainian places: Chornobyl/Chernobyl, Kyiv/Kiev, Belgrade, Sevastopol, and Crimea (not a city but a region).

Belgrade has been part of Serbia for hundreds of years, even while it was part of the Ottoman Empire, and more recently, part of Yugoslavia. 

The Ukrainian cities I mentioned were still part of Ukraine even while they were part of Russia and the USSR. 

For cities like this, perhaps it does make sense to allow standalone city keywords without a country, like "belgrade" and "kyiv." After all, there are already exceptions for other cities, like "new-york-city," which also does not have a country or state within the keyword.

The alternative is to include countries in all keywords. But perhaps both sets of keywords should remain. For example, "belgrade," "belgrade-serbia," and "belgrade-yugoslavia" could all coexist as keywords.

It starts to get more complicated though with many of the Ukrainian cities, which have different Romanized spellings under Ukrainian control versus Russian control. 

Again, I don't profess to know what the best answer is, but this should be resolved one way or the other. 

Similar to "belgrade," there is also the keyword "istanbul." In the past two months since I started this post, somebody took a lot of time to manually edit "belgrade" down from 72 titles to zero titles and "istanbul" down from 146 titles to zero titles, and then you just mentioned both those cities today. 

2.7K Messages

 • 

47K Points

@bradley_kent​ p.s. I am taking "belgrade" and "kyiv" off the list for now, and perhaps a new post should eventually be started to discuss and resolve how to handle the unique category of cities I have discussed above like Belgrade, Chornobyl, Sevastopol, Kyiv, etc.

__

Edit: to clarify, I removed "kyiv" from the list, but kept the proposed merger of "kiev-ukraine" into "kyiv-ukraine."

(edited)

1.3K Messages

 • 

23.4K Points

Any work that I have done on city keywords was determined by the specific content of the title in question.  If I couldn't figure that out, I just left it as a city, sans country.  Again, I really think that the determining criteria should be how the city is addressed in a specific title, and that, of course, could be more than one way,