bulmapunkrocker's profile

188 Messages

 • 

7K Points

Thursday, September 10th, 2020

Closed

Still Images wrongly tagged with language when there is no text

I have encountered a new problem with the new galleries in the desktop version:

 

I have added 2 new still images to this Australian TV Show https://www.imdb.com/title/tt1988386

Episode: https://www.imdb.com/title/tt2209111

 

Images: rm1939710209 tagged to language English, it doesn't give the possibility to untag any language, because it doesn't show any language when trying to edit the image to correct this error:

 

Here is an image of the submission, no language was added:

 

 

 
As you can see, 4 items were approved, the image, the two title links (The tv show and the episode) and the name link (actress). No language at all, it is a Still Image.
 
The same happened with this other image of the same episode, the situation is even weirder, because there isn't even numbers and the language of choice is Italian:
rm1771938049
 
Again, despite being a Still Image, and having no text at all, it was labelled as "Language: Italian" and there is no Edir options to remove the error by myself. I didn't submitted any language request:
 

Again, 5 items approved: the image itself, 2 name links (both actresses) and 2 title links (tv show and episode), as seen in the screenshot. No language at all, because still images doesn't allow language tags and there is absolutely no language in the image.

 

I have to re-submit the images to see if this was a one time mistake, but it happened again:

Check it out please:

First Image:

#200908-081450-083102

#200908-161023-045802

Duplicate deletion:

#200910-090400-022802

Because the "Language: English" happened in bith contributions, so I had to delete one of the duplicates. I was trying to resolve the issue by myself before bringing it here.

 

Second Image:

#200908-080152-157902

#200908-160952-030902

 

 

 

Duplicate deletion:

#200910-090502-836802

 

Because the "Language: Italian" happened in both contributions again, so I had to delete one of the duplicates. I was trying to resolve the issue by myself before bringing it here, I didn't wanted to cause any more trouble.

 

Can you please check and tell me why is this happening? Is this a bug or am I doing something wrong regarding this 2 specific images? I have verified and there is no metadata interfering.

I am looking forward to your response, thank you very much in advance.

 

Oldest First
Selected Oldest First

Accepted Solution

Employee

 • 

578 Messages

 • 

11.2K Points

5 years ago

Hey - language detection for images is done automatically by an API, supported by the language additions that users make to images. Unfortunately sometimes the API will incorrectly tag a language where it is not needed.

 

I've removed the language tags and I can see that your delete submission has been approved. If there's anything left to do on this one let me know and I'll take care of it.

188 Messages

 • 

7K Points

Thank you very much Grayson!

 

So sorry for the late reply, is there a way to avoid this incorrect tag by the API? Please excuse my ignorance and thanks again!

28 Messages

 • 

1.3K Points

Hi Grayson.

I hope this new API is under regular review for accuracy and relevance.

As a regular updater of image tags, I too have noticed a steep increase in the application of dubious language tags.

I have narrowed the problem down to three categories of problem; two of which are definitely API specific and the last - possibly exacerbating the API problem - is an IMDb issue/policy...

1) Language spotted but detected incorrectly. As can be seen from the following images:

There are four words visible; one a proper name, one an English word and the remainder (both the same) Estonian. No Spanish at all.

As with the preceding image, this has an Estonian 'Rescue' or 'Salvation' vehicle. The word Pääste does not appear in the German lexicon.

2) API seeing words that are not there. For instance:

No Italian here... no anything here.

Nothing English in this image. What partial letters there are hardly qualify as words.

Really? Exp?

3) This one is more important, harder to quantify and more impactful, but is also the one IMDb will defend; Irrelevant text. Take a look at the following:

Good shout. 'Edge' is definitely English...

...as is the phrase 'Life Jacket'.

'EST', however, is - in this case at least - an abbreviation for the country code 'Estonia' rather than the Latin 'it is' or 'it exists'. Finally...

'BA' may be a half decent qualification to have, but the abbreviation (wherever it may be found) is not a word. 'Latte' is a word but first, it is French and second, it is on a 'watermarked' Instagram tag.

The point is, the API is finding words everywhere.

I assume that the new language tag feature is a prelude for, after enough images have been tagged, the implementation of another filter, such as the ones that currently exist to enable narrowing images down to just those that are red carpet events, those that show a specific actress or just the theatrical posters.

Such a feature would certainly be useful to those users looking for DVD boxart or posters in their native language and, I reluctantly concede, may even be interested in movie screenshots that feature specific languages.

I would dare to suggest, however, that the latter filter use would be carried out infrequently and with a view to 'significance' rather than 'existence'. For instance, referring to ALL 9 images above, I cannot conceive of a useful purpose for the words seen, especially when their usage is incidental, the language has been mis-identified, they turn out to be abbreviations or acronyms or are just plain non-existent. (And all those examples were taken from the images of just one movie.)

I would propose that, as with so many things in life, size matters. The bigger the visible words, the more relevance or usefulness they are likely to have. The following examples make my point;

I suggest that if a couple of modifications were made to the API code, you could avoid >75% of language tag errors:

1) If an image is tagged as 'Poster' or 'Product', allow full detection and language tagging. Most posters have clear, recognisable text and will likely be the most cross-referenced for language.

2) If an image is tagged 'Event', disable language detection altogether. Most event shots have actors stood in front of 'Comic Con', 'Vogue', 'Hyatt Regency' or similar backdrops. I imagine this would be pointless tagging and the least cross-referenced for language.

3) If an image is tagged 'Still Frame' or 'Behind the Scenes', apply a modified detection algorithm that looks for clear, identifiable (say, with >85% certainty) whole words that are, at the very least, 32 pixels high. At an average 96dpi, this would be approximately 8mm. At higher resolutions, this would be smaller, but higher resolution images also tend to be larger, which would compensate for the reduced clarity. This size check would mitigate the identification of most, if not all, of the first nine images' useless or outright incorrectly tagged words.

I suppose this would depend on IMDb's tagging intent. Do they just want to identify every incidence of every word in every image, regardless of the utility of the resultant (and no doubt humongous) database? Bear in mind that if so, there is a resultant, storage and processing cost associated with the identification and later, when implemented, filtering of such an index.

Or, as I fervently hope, do they want to identify useful incidences of useful words that might appeal to fans, researchers etc, with a much reduced overhead? Your move IMDb.

(edited)