Re: [gtk-i18n-list] Unicode PUA supporting issue in gtk+/pango



Hi,

On Fri, 30 Dec 2005 14:29:09 +0600
Christopher Fynn <cfynn gmx net> wrote:
>PRC are also now using PUA as a "standard" encoding (in GB1300 (? & 
>GB18030 ?))  for pre-composed Tibetan "characters".
>
>F300-F8FF   Tibetan set A
>F0000-F1624 Tibetan set B

Oops, Great, Thank you for notice.

I heard the uncomposed Tibetan characters in standard area:
0x0F00-0x0FFF (firstly defined by GB16959:1997, succeeded
to GB18030) are defined as "small charset", and PRC had been
preparing "middle charset" and "larget charset". For a few
years, I'm looking for the publication of these charsets,
but only I could have is the small charset.

I don't heard about GB1300 - Zhe Su, do you know?

Unfortunately, about non-Hanzi scripts (Mongol, Tibet,
Uyghur, and Yi), GB18030:2000 publication defines only
codepoint mapping from GB18030 to Unicode, there's no
glyph shape data for these scripts. I'm not sure whether
these extra Tibetan charsets (middle, large) are already
defined or not, at that time.

>These characters can all be represented by combinations of characters in 
>the existing Tibetan block (U+0F00 - U+0FD1) - but I suppose they use 
>these PUA characters so as to avoid having to do complex script rendering.

I heard so (extra charsets are precomposed glyph sets
for unintellectual rendering systems). I've checked
tibetbt.ttf, a TrueType font (not OpenType) distributed
by PRC's Tibetan site for Microsoft Windows, and found
that 5604 glyphs are included and the number of Tibetan
glyph is ca 4594 (rests are alphabets, greek, cyrillic,
punctuations, bopomofo, hiragana + katakana). Nothing
to say, most of them are precomposed, and fixed-width
glyphs. Included cmap are: Apple Roman and Microsoft
UCS-2, and it seems that the codepoints 4E00-9251 are
used to access Tibetan glyphs (of course, such usage is
theoretically wrong). The codepoint area is larger than
Tibetan set A area (ca 1530), but a bit smaller than
that of set B area (ca 5668). I'm not sure if the charset
of the font is a subset of set B. Anybody knows?

Regards,
mpsuzuki



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]