On Tuesday 20 December 2005 11:19, mpsuzuki hiroshima-u ac jp wrote: > On Tue, 20 Dec 2005 10:56:26 +0800 > Zhe Su <james su gmail com> wrote: > > Because you know, Chinese standards of HongKong and Taiwan use > > such area to store extra characters that aren't in Unicode yet. For > > example Taiwan has more than 30,000 Traditional Chinese chars in > > Unicode plane 15, HongKong has more than 5000 chars in > > U+E000-U+F8EF area. So it's mandatory for them to support these PUA > > areas. > > "yet" - Hmm, PRC/Taiwanese government is going to push these > characters to Unicode standard, and stop to use PUA in future? > In the other words, the mapping between PUA and PRC/Taiwanese > national code should be kept/maintained/kept in future? Or, > PUA is just used for temporal use? The HKSCS (Hong Kong) characters are already mapped to Unicode 4.1 (HKSCS 2004). The ones from Taiwan will hopefully come with CJK Extension C1 / C2. However, there are a lot of older HKSCS documents out there which use BMP PUA codes instead of the actual unicode ones. And and it will still be a long time until CJK Ext. C comes out. And even after that the PUA and Plane 15/16 areas will still be used for temprary storage of characters which are not yet or will never be in Unicode... my fonts (CJK Unifonts) currently use Plane 15 for that purpose. ( I will change it once can get my hands on a table which displays the glyphs used by Taiwanese government in that area. James, do you have one of of those?) So, support for PUA, Plane 15 and 16 *is needed*. > If the answer is "YES" (the utilization of PUA will be stopped > in future), the best way for proper Unicode implementation is > just waiting for the day Unicode standards include these characters. > > My anxiety is that: if we write a documentation including PUA > charcode today, and read it after the official inclusion of the > characters... we cannot search a string without the extra mapping > table of PUA code and Unicode codepoint. And, we need a switch > to enable/disable to use the extra mapping, because there are > people using the PUA codepoints for different purpose. Is it > the role of iconv? No. it would be the matter of the fonts to supply alias codepoints, so that both, old documents and new documents can be displayed. However, if someone wants to convert Unicode documents from PUA codepoints to new official codepoints, there should be a script provided to do that manually (for example a plug-in in openoffice.org). This is not necessarily iconv, although it would be nice if it knew such an encoding (i.e. UTF8-HKSCS and UTF8-CNS11643 or so...). Would make things easier... Cheers Arne -- Arne Götje (高盛華) <arne linux org tw> PGP/GnuPG key: 1024D/685D1E8C Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.
Attachment:
pgp054HD8aNYD.pgp
Description: PGP signature