Re: Language tags



Owen Taylor wrote:
> 
> Steve Underwood <steveu coppice org> writes:
> 
> > > Are used for http, html, xml, mail, and are the form of language tag
> > > currently supported by the Unicode plane 14 language tag
> > > characters. So, using these tags will give wide application
> > > compatibility. They also are compatible with the rough idea of 'en_US'
> > > being a language tag - if the first component of a language tag is 2
> > > characters, it must be a ISO 639-1 language code; if the second
> > > component of the tag is 2 characters, it must be a ISO 3166 country
> > > code.
> >
> > These two sections don't seem to tie together too well. zh-hakka and
> > zh-min-nan are purely issues of spoken Chinese, and have no bearing on
> > how it is written. The information you said you would need at rendering
> > time - simplified versus traditional - is another issue entirely. Using
> > these language tags does not, therefore, seem to provide the information
> > you need.
> 
> Use of zh-tw and zh-cn language tags and locales for this purpose is a
> long-standing and universal abuse of the concept, and I don't see
> trying to deviate for Pango.
> 
> A lame justification could be made that written chinese is really a
> different language than spoken chinese (well, written chinese is a
> different language then the spoken chinese _languages_), and the
> simplified and traditional forms are in some sense "dialects" of this
> written form.
> 
> > Having said this, I'm not sure it matters for Unihan. I am not aware of
> > any instance where the look of an Unihan character is dependant on
> > whether the text is simplified or traditional Chinese. This was an issue
> > early Unicode, where there was some merging. In Unicode 3 all
> > traditional and simplified characters co-exist, with separate code
> > points. Text may freely mix the two, which it often does in real life
> > (e.g. a simplified Chinese text containing a Hong Kong company name
> > would normally have that name rendered in its traditional form - at
> > least in typesetting, where such free intermixing has worked well for
> > years). If you want to try mixing existing fonts in the rendered output,
> > there may be some issues, but these have nothing to do with the language
> > itself.
> 
> Considering that you don't generally have fonts that unify traditional
> and simplified forms, even if the encoding is Unicode, knowing
> the intent of the text in terms of simplified vs traditional forms
> is pretty important for font choice.

This must change if there is to be any real support for Unicode 3.1. If
you make a design embedded in the way things are today, will it cope
well in the future?

Unicode 3 expanded the Unicode set well beyong GB2312, Big5, Big5HKCS
and commonly used set. Unicode 3.1 basically dumps the whole Library of
Congress into Unicode (I'm not precisely sure if that is exactly true,
or just a close approxiamation). If fonts are going to be made available
which support even a subset of this expansion (subset seems highly
likely, full set highly unlikely) I assume we will see Unicode encoded
fonts appearing in some quantity in the not too distant future - what
other encoding could they use? I believe W2K encourages this path
(though I have never used it).

As for current Chinese fonts. Mostly these seems to be developed as a
comprehensive traditional + simplified set. The simplified characters
are then made into a GB2312 font, and the traditional into a Big5 (or
maybe Big6HKCS) font. The large overlapping group obviously goes into
both as bit exact replicas. If the pair of fonts is installed on a
machine, their contents can be freely intermixed today, and there is no
need to differentiate. Guaranteeing that at rendering time is a pain,
though. I'm not sure what I would recommend as a best compromise
solution - any solution is certainly a compromise, though.

Regards,
Steve




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]