Re: Fw: Language tags

I appreciate you taking the time to to send your thoughts on this
issue. It's always good to get input from a real expert in the area.

I'm encouraged by the fact that while I only spent a few hours
cataloging existing practice, I had managed to locate most of the
examples mentioned by you - ISO 639 and RFC 3066 of course, but also
ISO 15924, OpenType script/language tags, Microsoft LangID's and so forth.

It's certainly true that the purpose of language tags in Pango does
not exactly conform to the intent of RFC 3066; I'd describe the
purposes of the Pango tagging as:

 Providing any information about the language of the source text that
 would be useful for the processes of displaying and editing the text
 in written form.  These processes include hyphenation, boundary
 resolution, and font and glyph selection.

There are certainly variants of practice in these areas that have no
correspondance to spoken language. But I believe that RFC 3066
language tags are close enough in intent and form to be quite useful
for this purpose - certainly closer than ISO 15924 script tags.

And practically speaking, language information from higher level
protocols (HTTP, mail, etc), will most frequently be in RFC 3066 form,
so anything else would be quite inconvenient for applications.

Since the form I'm proposing for Pango script tags is to use the RFC
3066 form, with arbitrary numbers of subcomponents, and no
interpretation of the subcomponents, I believe it should be no problem
to accomodate future extensions. If the use of multiple different tags
for the same language becomes frequent, than an aliasing mechanism
might be necessary, but that should be easy to add.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]