Re: Language tags

Steve Underwood <steveu coppice org> writes:

> > Are used for http, html, xml, mail, and are the form of language tag
> > currently supported by the Unicode plane 14 language tag
> > characters. So, using these tags will give wide application
> > compatibility. They also are compatible with the rough idea of 'en_US'
> > being a language tag - if the first component of a language tag is 2
> > characters, it must be a ISO 639-1 language code; if the second
> > component of the tag is 2 characters, it must be a ISO 3166 country
> > code.
> These two sections don't seem to tie together too well. zh-hakka and
> zh-min-nan are purely issues of spoken Chinese, and have no bearing on
> how it is written. The information you said you would need at rendering
> time - simplified versus traditional - is another issue entirely. Using
> these language tags does not, therefore, seem to provide the information
> you need.

Use of zh-tw and zh-cn language tags and locales for this purpose is a
long-standing and universal abuse of the concept, and I don't see
trying to deviate for Pango.

A lame justification could be made that written chinese is really a
different language than spoken chinese (well, written chinese is a
different language then the spoken chinese _languages_), and the
simplified and traditional forms are in some sense "dialects" of this
written form.
> Having said this, I'm not sure it matters for Unihan. I am not aware of
> any instance where the look of an Unihan character is dependant on
> whether the text is simplified or traditional Chinese. This was an issue
> early Unicode, where there was some merging. In Unicode 3 all
> traditional and simplified characters co-exist, with separate code
> points. Text may freely mix the two, which it often does in real life
> (e.g. a simplified Chinese text containing a Hong Kong company name
> would normally have that name rendered in its traditional form - at
> least in typesetting, where such free intermixing has worked well for
> years). If you want to try mixing existing fonts in the rendered output,
> there may be some issues, but these have nothing to do with the language
> itself.

Considering that you don't generally have fonts that unify traditional
and simplified forms, even if the encoding is Unicode, knowing
the intent of the text in terms of simplified vs traditional forms
is pretty important for font choice.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]