[Peter_Constable sil org] Fw: Language tags



--- Begin Message ---
Owen:

I'm not part of this list, but this message was just forwarded to me. There
are probably a number of issues I might be able to comment on here, but I
do not have time to give this a careful read as I have a project to finish
before I leave on Saturday for three weeks. Let me make a couple of quick
comments. Please feel free to forward these to the list, and please also
bear in mind that I'm commenting without the benefit of having digested the
full context of the discussion.

First, "language" tags should identify languages. In some systems of
language identification, they are identifying locales. In most
implementations for text processing, what are really needed are writing
system identifiers, but there is not currently any system that defines
these that I know of.

Secondly, ISO 639-x has a number of issues, as do all systems of language
identification currently in use in IT. If anyone is interested, I invite
them to read a paper that I co-authored which was presented at the 17th
International Unicode Conference: http://www.sil.org/silewp/2000/001/

Thirdly, there is an article that I wrote discussing existing systems of
language identification that was published in Multilingual Computing &
Technology #40. (Contact Multilingual for more info:
http://www.multilingual.com. I don't know if they will be publishing that
article on-line.)

Fourthly, there is a growing momentum in the industry to see further work
done on dealing with problems related to language identification. Growing
out of the paper I presented at IUC17, the Unicode Technical Committee
voted to propose to IETF an idea that was put forth in our paper that would
involve an extension to RFC 3066. A proposal has not yet been written,
though, in part because TC 37 (owners of ISO 639 part 1) are convinced that
they need to do some new work on this matter. Both parties have interacted
with us (SIL) because they would like to see a solution to these problems
include comprehensive coverage of languages, and are looking to somehow
build off the SIL Ethnologue as a way to achieve this (cf
http://www.sil.org/ethnologue/). TC 37 is holding meetings in August, and a
meeting is being scheduled for TC 37 SC 2 WG 1 specifically to discuss new
work projects. There are a number of stakeholders in this issue from
various sectors within the industry: software system architects (that's
where I'd include Pango, but this is a large and sometimes disparate
crowd); the localisation and translation sector; W3C and parties interested
in markup protocols or similar standards; people working on information
metadata issues; and major clients (e.g. US Gov't). If ISO does something,
it will very likely have implications for several things including RFC
3066.

Fifthly, people working in somewhat self-contained areas of technology
sometimes see a problem in their area, but do not always see how that
problem also affects others. I have no idea if this is happening as far as
language tagging in relation to Pango is concerned. I know, for example,
that the OpenType spec includes a number of language and script tags, and
these appear to have been created without consideration for other existing
systems of identifiers (e.g. ISO DIS 15924), and without consideration for
how those tags are supposed to interrelate with the internationalisation
infrastructure of the systems within which they'll be used (notably
Windows). So for example, there is no mapping defined by MS between OT
language tags and Win32 LANGIDs, and it isn't even clear to me that the two
can really be aligned. (In spite of the name, LANGIDs are really more like
locale IDs, though even there they don't fit the language x country
prototype).

Finally, please do not even consider using the Unicode Plane 14 language
tag characters for anything. They are A Bad Idea (tm).


Hope this is of some use.


- Peter

[...]




--- End Message ---


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]