Re: Non-ascii translatable strings



tor 2004-05-20 klockan 02.42 skrev Gareth Owen:
> According to the L10N Guidelines for Developers[1] strings marked for
> translation should always be in 7-bit ASCII. However, the latest muine
> code contains two non-ascii strings that are marked for translation
> (src/About.cs lines 40 and 58).
> 
> Maybe, someone more knowledgeable than me can explain why this is a
> requirement

It's required that both the msgid:s (translatable source strings) and
msgstr:s (translated messages) use the same character set for gettext to
work. Why this is so is explained in the L10N Guidelines.
Anyway, traditionally this has meant that source strings has had to be
in ASCII, since the ASCII range is the only common subset of almost all
character sets that are common in the world and used by translations.

Gettext simply didn't work and gave lots of errors when strings were
using non-ASCII characters, and thus the translations of any such
messages would never be used and never work.

However, when having a mandatory common character set for the source
messages and all translations, like in GNOME now with UTF-8, things look
a bit different. This does theoretically very well fulfill the common
character set requirement, and gettext also accepts UTF-8 in
translatable source messages as of GNU gettext 0.12.

So, in case it's acceptable for a particular application to introduce a
dependancy on GNU gettext >= 0.12, using UTF-8 in translatable messages
is perfectly fine. Otherwise I strongly suggest not using UTF-8
characters in messages, as that this means that those message
translations may not work.
Also, in case this is acceptable, an explicit dependency set in the
application is best, as users who build the application may otherwise
wonder why some messages won't appear translated even though
translations exist.

Also, GNU gettext >= 0.12 is only found in the most recent
distributions, so requiring it may in many applications not be
acceptable yet. Also, I'm not sure e.g. Solaris' gettext implementation
supports UTF-8 in source messages, so support on other platforms and
operating systems may be a valid concern aswell.


Christian




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]