Re: Locale tags in translated XML



Op Di, 2011-05-03 om 10:43 -0400 skryf Shaun McCance:
> Hi all,
> 
> This is long and somewhat technical. Sorry. If your eyes
> start going cross-eyed, feel free to ignore.
> 
> I've been trying to figure out how to make our POSIX-style
> locale tags cooperate with the BCP47 tags that are commonly
> used on the Internet and in XML. It's bothered me for a long
> time that we put non-BCP47 tags in our xml:lang attributes.
> 
> Besides being against recommended practice, it also means
> that the XPath lang() function doesn't work as well as it
> ought to. This could come into play with the new Mallard
> conditional processing work.
> 
> Also, with my recent work on itstool, I don't really want
> to have it output non-standard XML.
> 
> What I'd like to do is have us always use BCP47 locale tags
> in the xml:lang attribute in our XML documents. They would
> still be managed by PO files named according to our locales,
> and we would still install them according to our locales.
> For example, we'd still have sr latin po, and the XML would
> still go to /usr/share/gnome/help/.../sr@latin/, but the
> xml:lang attribute on the actual xml files would be
> 
>   xml:lang="sr-Latn"
> 
> I'll convert these automatically when generating the XML.
> I've detailed the conversion below.
> 
> This also means the XSLT internationalization in Yelp has
> to change. Right now, it expects POSIX-style locales, and
> it falls back to base languages by trying all combinations
> of the parts of the tag in a specified order. (I think I
> copied that order from GLib.)
> 
> So, I would have to make the i18n machinery in yelp-xsl
> expect BCP47 locales, because it matches on the document's
> language, not your locale. What I would do is treat any
> locale as a list of tokens separated by non-alphanumeric
> characters, then simply chain up by removing trailing
> parts.
> 
> There's a slight loss of functionality here. Currently,
> for a document declared as sr_SR@latin, Yelp can use the
> localizations for sr@latin (as well as sr_SR or just sr).
> Under the new scheme, it can still find translations in
> sr_SR and sr, but not sr@latin.
> 
> Practically speaking, we don't have translations that use
> sr_SR@latin. Sure, it might be your actual LANG on your
> computer, but we don't have PO files for it. So I don't
> think we're actually fully exercising Yelp's capabilities
> with fallback languages.
> 
> Translators would still manage the yelp-xsl translations
> as they do now. I'd handle the locale tag conversion at
> build time when generating the XML string catalog.
> 
> 
> Converting Tags
> ===============
> 
> Our locale tags take the form:
> 
>   ll_RR@variant.charset
> 
> Where ll is the primary language, RR is a region (country),
> variant is some sort of variant, and charset is the character
> encoding to use.
> 
> BCP47 locale tags look generally like:
> 
>   language-script-region-variant
> 
> I will from POSIX-style locales to BCP47 as follows:
> 
> The charset will be dropped. It's not relevant for what I'm
> doing here. The language and region will be copied into their
> correct places. Note that in BCP47, region comes after script.
> The variant will be converted on a case-by-case basis:
> 
> @cyrillic will be changed to Cyrl and used as script.
> @devangari will be changed to Deva and used as script.
> @euro will be dropped.
> @ije will be used as variant.
> @latin will be changed to Latn and used as script.
> @shaw will be changed to Shaw and used as script.
> @valencia will be used as variant.
> 
> Looking through locale -a, that leaves @abegede, @iqtelif,
> and @saaho. I haven't looked into what these are yet. Any
> variants I don't special-case will be used as the variant
> in the BCP47 tag.
> 
> Examples:
>   ca@valencia       ->  ca-valencia
>   en@shaw           ->  en-Shaw
>   ks_IN@devanagari  ->  ks-Deva-IN
>   sr@ije            ->  sr-ije
>   sr_RS@latin       ->  sr-Latn-RS
>   uz_UZ@cyrillic    ->  uz-Cyrl-UZ
> 
> Converting back from BCP47 to POSIX is harder. Luckily,
> I don't think we need to do it.

Hi Shaun

It looks great. I've only done some very basic stuff in this regard, and
would like to check that our tools work as well as you describe in this
regard. Do you have any existing code? I'd like to borrow some of your
ideas for our code if possible. I think what you have is more thorough.

Friedel




--
Recently on my blog:
http://translate.org.za/blogs/friedel/en/content/better-lies-about-gnome-localisation



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]