taewong's profile

6 Messages

 • 

370 Points

Monday, March 25th, 2013 12:37 PM

No Status

58

Support for Unicode.

Unicode is not fully supported in IMDb. For example, in Polish: you could change all references by searching “milosc” and then changing them to “miłość”. And Jiří Hnídek is written without an r-hacek on the start of their first name. It can also do the same for the ILM person Coşku Özdemır which is an Turkish person listed on Cinefex.

1 Message

 • 

100 Points

12 years ago

I agree. This affects titles, names, characters, discussion, and probably more.

Where I run into the problem most is in the discussion forums. If you paste non-ASCII characters copied from somewhere else (for example, to show a symbol that was in the film, or indeed to show the native-language title of the film), they just get turned into what appears to be the HTML text code for those characters, instead of the symbol itself.

It's 2013. This shouldn't be happening.

Champion

 • 

19.4K Messages

 • 

477.1K Points

Since this is the International Movie Data Base, it is truly surprising that they do not support unicode.

Champion

 • 

4.6K Messages

 • 

236.3K Points

Except that it's "Internet Movie Database," not "International." ;)

Champion

 • 

19.4K Messages

 • 

477.1K Points

This must be a Freudian slip. [wink]
Reminder to self: Don't post when tired.

Champion

 • 

4.6K Messages

 • 

236.3K Points

LOL. Too be honest, though, it's almost like they want it to be known that way. Most mentions of the spelled-out name are gone. Kind of like Kentucky Fried Chicken is only KFC now. New visitors seem to be having a hard time figuring out what the site is...video streaming, file sharing?

Champion

 • 

1.9K Messages

 • 

146.1K Points

Or after taking some random prescription medication you found lying in the meep.

6 Messages

 • 

370 Points

Yeah. Do not post nonsense. You have accidentally removed a comment (you need to dispute this remove).

6 Messages

 • 

370 Points

12 years ago

Since MobyGames supports Unicode, macrons in Japanese are OK for long vowels. Note that the title ends with a punctuation mark (full stop). Hungarian, Czech, Polish, Romanian, Slovak etc. requires a bunch of accented letters.

Champion

 • 

19.4K Messages

 • 

477.1K Points

12 years ago

It is almost like Randall Munroe has been reading this forum.
http://xkcd.com/1209/

6 Messages

 • 

370 Points

12 years ago

You quote the comic: “The Skywriter we hired has terrible Unicode support.”

After correcting Miroslav Kure's suname to Miroslav Kuře (to match Czech support: the Danish/Faroese/Norwegian ø is rcaron) in Battle for Wesnoth 1.11.1 contribution community, you have many problems with the Internet Archive Wayback Machine this time. First the connection is too slow to load and you get the error mesage “The machine that serves this file is down. We're working on it.” twice. Unicode in their own forum affects subjects (titles) and more. Note that the thread has nonsense!

Champion

 • 

1.9K Messages

 • 

92.6K Points

12 years ago

This has been mentioned many times over the past few years. A bit of history may help here.

When IMDb first started, it was updated by an automated email system. This was at a time when some of the email routers still only handled 7-bit ASCII and special encoding was needed to ensure that 8-bit codes would not be trashed. Moreover, some characters (e.g. | the 'pipe') were used internally (and in the email) as controls/delimiters. This is why you may sometimes see older contributors indicate a credit update as :

John Doe | 2nd Pirate | 22

By the time Unicode became standard, the system had grown quite complex. Before Unicode can be implemented, every part of the system needs to be checked and potentially modified to ensure that it will not be broken by any of the Unicode codes.

IMDb is currently in the process of moving the various lists (sections) to new internal systems. I hope and expect that they are designing these systems so that they will be able to support Unicode.

Once the moves have been completed, we may see support for Unicode, but don't expect it any time soon.

6 Messages

 • 

370 Points

11 years ago

You will need an answer. You have removed the first reply by accident. Where a name includes a suffix, we use a comma to separate it from the name. On game credits and indexes it is not treated as an integral part of the surname. Examples are:

Hernandez, Jonathan, Jr
Rowe, William A., Jr.
Tibbetts, Richard S., III

It thinks that the Get Satisfaction software uses Unicode. It supports different accented characters for Eastern European languages.

Champion

 • 

4.6K Messages

 • 

236.3K Points

The change log says you removed it...??? What the..??

6 Messages

 • 

370 Points

This reply was removed on 2013-03-25.

Champion

 • 

4.6K Messages

 • 

236.3K Points

Yep. And:


3 months ago
taewong, the poster:
Removed a reply in this topic
Reason: removed by the poster

1 Message

 • 

60 Points

11 years ago

Actually, it seems that after the message-board makeover, Unicode support is even worse! At least with the old ones you could enter most extended ASCII glyphs (assuming proper code-page is set). But now anything that is above 127 doesn’t work.

1 Message

 • 

82 Points

10 years ago

It's year 2014 and some Czech characters are still not supported.

8 Messages

 • 

160 Points

10 years ago

It's almost 2015 and Greek characters aren't supported AT ALL.

5 Messages

 • 

116 Points

10 years ago

This reply was created from a merged topic originally titled
How many years will it take you to understand UNICODE?.


In 2009, in Contact #3034383 (http://www.imdb.com/helpdesk/thread?tid=3034383) you the owner of IMDB promised professional usage of UNCODE "in a little while". It is now 5 years and a half later and your web site is still crippled with no UNICODE implementation. 5 years and a half??? Don't you fill embarrassed with your "professionalism"? Shall we wait another 5 years for IMDB to understand the word "international"?

(This post is addressed solely and specifically to IMDb staff.)

5 Messages

 • 

116 Points

Correction: "don't you FEEL"

Employee

 • 

11 Messages

 • 

2.9K Points

10 years ago

We are making slow and steady progress on Unicode support.  Note that until every single part of a system supports Unicode, none of it works.  We have a lot of critical backend systems that need to be migrated.  Unfortunately, we don't have a timetable that we can share, but please be aware that we are working on it.

Note that in the last few weeks we've enabled full Unicode support in the message boards:

http://www.imdb.com/board/bd0000043/nest/235469052

We had a number of encoding issues that I believe we have fixed.

Employee

 • 

11 Messages

 • 

2.9K Points

10 years ago

Note that user reviews:

http://www.imdb.com/user/ur2278015/

...and lists:

http://www.imdb.com/list/ls001825868/

...also support Unicode.

8 Messages

 • 

160 Points

10 years ago

Yes, but no movie display titles...

Employee

 • 

11 Messages

 • 

2.9K Points

There is already limited support for this; see the Greek title here:

http://www.imdb.com/title/tt0015648/releaseinfo#akas

Our systems currently use a mixture of ISO-8859-1, UTF-8, and KOI8-R.  Untangling this mess while keeping things running is like changing the fan belt on an engine without switching it off.

8 Messages

 • 

160 Points

I tried to add a title in a movie but the system didn't let me. It errored in every letter i entered.

Employee

 • 

11 Messages

 • 

2.9K Points

Yup.  The submissions pipeline doesn't yet handle Unicode.

8 Messages

 • 

160 Points

So, the movie titles written with Greek characters are made by the people inside?

Is there a timeline when I will be able to contribute Greek titles?

Employee

 • 

11 Messages

 • 

2.9K Points

Yes, there were some cases added manually years ago.

We don't have a timeline yet, but we know people really want it.

1 Message

 • 

64 Points

8 years ago

3 years has passed and IMDB is still mentally in the pre-unicode 1990's.

If you don't want to fix your database for unicode support, then just write parsers and translate user input to html codes.
Moreover, some html codes are not supported, eg. ń

NB. It is not possible to have a title with a non-basic-latin character. Even if I fix a movie and input a html the form will on the fly change it to unicode and report a problem (!!)

2.7K Messages

 • 

83K Points

8 years ago

The last update on this (at least in this thread) was two years ago, so can a staffer tell us what has happened these past two years regarding this issue?
(I note that in the message boards on IMDb, one could see exactly when a post was made, here I can only see that Murray responded two years ago, not very specific).

Champion

 • 

7.4K Messages

 • 

276K Points

Marco: In response to your latter comment, you can see the exact time of a post here, at least on the desktop version of GetSatisfaction. To do that, hover your mouse over the time designation of the post (such as "2 years ago"). So, for example, Murray's post that begins "Yes, there were some cases added manually years ago" was posted October 9, 2014 at 10:46:58 PM UTC.

I don't know whether or how it is possible to see the exact date and time on the mobile version of GetSatisfaction.

2.7K Messages

 • 

83K Points

Thanks Gromit!
Is there also a way I could've replied this post to you instead of to myself that I haven't found?