Re: [gtk-i18n-list] Unicode PUA supporting issue in gtk+/pango

On Tuesday 20 December 2005 11:19, mpsuzuki hiroshima-u ac jp wrote:
> On Tue, 20 Dec 2005 10:56:26 +0800
> Zhe Su <james su gmail com> wrote:
> >   Because you know, Chinese standards of HongKong and Taiwan use
> > such area to store extra characters that aren't in Unicode yet. For
> > example Taiwan has more than 30,000 Traditional Chinese chars in
> > Unicode plane 15, HongKong has more than 5000 chars in
> > U+E000-U+F8EF area. So it's mandatory for them to support these PUA
> > areas.
> "yet" - Hmm, PRC/Taiwanese government is going to push these
> characters to Unicode standard, and stop to use PUA in future?
> In the other words, the mapping between PUA and PRC/Taiwanese
> national code should be kept/maintained/kept in future? Or,
> PUA is just used for temporal use?

The HKSCS (Hong Kong) characters are already mapped to Unicode 4.1 
(HKSCS 2004). The ones from Taiwan will hopefully come with CJK 
Extension C1 / C2. However, there are a lot of older HKSCS documents 
out there which use BMP PUA codes instead of the actual unicode ones. 
And and it will still be a long time until CJK Ext. C comes out. And 
even after that the PUA and Plane 15/16 areas will still be used for 
temprary storage of characters which are not yet or will never be in 
Unicode... my fonts (CJK Unifonts) currently use Plane 15 for that 
purpose. ( I will change it once can get my hands on a table which 
displays the glyphs used by Taiwanese government in that area. James, 
do you have one of of those?)
So, support for PUA, Plane 15 and 16 *is needed*.

> If the answer is "YES" (the utilization of PUA will be stopped
> in future), the best way for proper Unicode implementation is
> just waiting for the day Unicode standards include these characters.
> My anxiety is that: if we write a documentation including PUA
> charcode today, and read it after the official inclusion of the
> characters... we cannot search a string without the extra mapping
> table of PUA code and Unicode codepoint. And, we need a switch
> to enable/disable to use the extra mapping, because there are
> people using the PUA codepoints for different purpose. Is it
> the role of iconv?

No. it would be the matter of the fonts to supply alias codepoints, so 
that both, old documents and new documents can be displayed. 
However, if someone wants to convert Unicode documents from PUA 
codepoints to new official codepoints, there should be a script 
provided to do that manually (for example a plug-in in 
This is not necessarily iconv, although it would be nice if it knew 
such an encoding (i.e. UTF8-HKSCS and UTF8-CNS11643 or so...). Would 
make things easier... 


Arne Götje (高盛華) <arne linux org tw>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F  1C34 6E9F D06E 685D 1E8C
Key available at   Encrypted e-mail preferred.

Attachment: pgp054HD8aNYD.pgp
Description: PGP signature

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]