Re: NLS question

From: Pablo Saratxaga <pablo mandrakesoft com>
To: gnome-list gnome org
Cc: gnome-i18n nuclecu unam mx
Subject: Re: NLS question
Date: Mon, 10 Jan 2000 06:43:22 +0100
Kaixo!

On Sun, Jan 09, 2000 at 10:23:31PM -0600, Eric Gillespie, Jr. wrote:

> I've just finished adding NLS support to my app, and now I'm asking
> users to provide translations. I'm wondering if there is a list of the
> codes for languages and dialects. In other words, how do I know what to
> call a .po file after someone sends it to me?

[PS: gnome-i18n@nuclecu.unam.mx would be a better place; I'm Cc: ]

Here is the information I have collected so far,
with in parenthesis the charset used in Linux:

af:	Afrikaans (iso-8859-1) 		ar: Arabic (iso-8859-6) [1]
bg:	Bulgarian (microsoft-cp1251)	br: Britton (iso-8859-1)
ca:	Catlan (iso-8859-1)		cs: Czech (iso-8859-2)
cy:	Cymraeg (Welsh) (iso-8859-14)	da: Danish (iso-8859-1)
de:	German (iso-8859-1)		el: Greek (iso-8859-7)
en:	English (iso-8859-1)		eo: Esperanto (iso-8859-3)
es:	Spanish (iso-8859-1)		et: Estonian (iso-8859-15)
eu:	Euskara (Basque) (iso-8859-1)	fa: Farsi (Iranian) (isiri-3342) [1]
fi:	Finnish (iso-8859-1)		fo: Faroese (iso-8859-1)
fr:	French (iso-8859-1)		ga: Gaeilge (Irish) (iso-8859-14)
gl:	Galego (iso-8859-1)		he: Hebrew (iso-8859-8)
hr:	Croatian (iso-8859-2)		hu: Hungarian (iso-8859-2)
hy:	Armenian (armscii-8)		id: Indonesian (iso-8859-1)
is:	Icelandic (iso-8859-1)		it: Italian (iso-8859-1)
ja:	Japanese (euc-jp)		ka: Georgian (???) [2]
kl:	Greenlandic (iso-8859-1)	ko: Korean (euc-kr)
lo:	Laotian (mulalao-1) [3]		lt: Lithuanian (iso-8859-13)
lv:	Latvian (iso-8859-13)		ms: Malay (iso-8859-1)
nl:	Dutch (iso-8859-1)		no: Norwegian (iso-8859-1) [4]
oc:	Occitan (iso-8859-1)		pl: Polish (iso-8859-2)
pt:	Portuguese (iso-8859-1) [5]	ro: Romanian (iso-8859-2)
ru:	Russian (koi8-r)		sk: Slovakian (iso-8859-2)
sl:	Slovenian (iso-8859-2)		sp: Serbian (Cyrillic) (iso-8859-5)
sr:	Serbian (latin) (iso-8859-2)	sv: Swedish (iso-8859-1)
th:	Thai (tis-620)			tr: Turkish (iso-8859-9)
uk:	Ukrainian (koi8-u)		vi: Vietnamese (tcvn-5712) [6]
wa:	Walon (iso-8859-1)		
zh_CN.GB2312: Chinese (simplified) (gb2312)
zh_TW.Big5: Chinese (traditional) (Big5)

Notes:
[1]: Due to lack of right-to-left support there is almost no program
     using them; so the encoding is rather theoric. Only one I know of
     is 'acon' to have arabic on the Linux console; it uses iso-8859-6
     Programs displaying Hebrew cheat, they use strings in visual order
     (that is written in reverse order on the file); they use iso-8859-8
[2]: There are 2 charset encodings for Georgian (other than unicode):
     georgian-academy: used by academic world (Universities in georgia)
     georgian-ps: used by the web site of the Georgian parliament and by
 		  the "Soros found" (which created the mentiond site I think)
     None is really "the" standard :-(
[3]: There doesn't exist any program in laotian (yet) so the charset to
     use isn't really known; but the mulelao-1 encoding is more logical
     than the ibm-cp1333 imho, as it follows same logic than the thai encoding
[4]: 'no' is used for 'Bokmaal' variant of Norwegian; 'no@nynorsk' for
     the 'Nynorsk' variant.
[5]: often Brazilian and Portuguese variants of Portuguese languages are
     split (in 'pt_BR' and 'pt') as they differ a lot on computer related
     terms, which happen to appear a lot on computer programs :)
[6]: There are a lot of encodings for vietnamese; but two that can be 
     considered standard: viscii 1.1 and tcvn-5712. tcvn-5712 being used by
     official vietnamese bodies it should be considerd the default standard.
     viscii 1.1 is used a lot among emigrees. There can be emotive reasons
     over the choice of one or the other. On the other hand converting
     between them is computer trivial, so it is easy to provide both for
     final users if someone sends a translation. 
     Microsoft is also trying to impose its own vietnamese standard,
     but without success so far.
[7]: all iso-8859-1 can be replaced with iso-8859-15 the differences
     don't affect letters, only symbols; but it is not true on the other
     way: Estonian uses iso-8859-15 and not iso-8859-1 as it needs at
     least the scaron letter. French should use iso-8859-15 too to properly
     display the OE letter; but people is now accostumed to write it as
     two separete: O E.
     What is important, the iso-8859-1  acute accent char shouldn't be
     used as an apostroph (that is done a lot by German people, I don't
     know why) as that will be seen as a zcaron (or is it scaron?) letter
     when switching to iso-8859-15 ! That switch should be done at some
     time because of the euro symbol provided in iso-8859-15.

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.ping.be/~pin19314/		PGP Key available, key ID: 0x8F0E4975
References:
- NLS question
  - From: Eric Gillespie, Jr.
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]