Re: [gtk-i18n-list] Unicode PUA supporting issue in gtk+/pango

Dear Zhe Su, Arne G tje, and Chia-I Wu

About the TTF and codepoint issue, now I think
following points are shared awareness of us.
If I'm misunderstanding, please correct me.

P1: We can make a TTF return a glyph shape
    of right codepoint for PUA codepoint. 

P2: No method to make a TTF which converts
    PUA codepoint to right codepoint.

P3: The identification of character is left
    to human eye, it's done by glyph shapes.


On Fri, 23 Dec 2005 10:22:20 +0800
Chia-I Wu <b90201047 ntu edu tw> wrote:
>On Thu, Dec 22, 2005 at 12:44:53AM +0900, mpsuzuki hiroshima-u ac jp wrote:
>> >By mapping U+E78D and U+FE10 to the same glyph?
>With this mapping, fonts work identically on systems using unicode 4.1
>or pre-4.1.  That is, they provide backward compatibility.

Oh no, it's not what I was asking about... What you
described is for the "compatibility" in GUI, but I
think the character identification by human-eye is
not good idea. My original question was:

On Tue, 20 Dec 2005 23:17:20 +0900
mpsuzuki hiroshima-u ac jp wrote:
>On Tue, 20 Dec 2005 17:29:57 +0800
>>They are the same characters. Ergo this issue should be handled by the
>>font. For example, in the font the character moves to its final
>>position in Unicode (U+FE10), but the former (unofficial) position
>>(U+E78D) contains a reference to U+FE10.
>Excuse me, please describe in detail how to pickup
>this information ("U+E78D" should be displayed by
>"U+FE10") from font file.

Ah, sorry, it was asked to Arne, not to you. In this
question, what I care is the codepoint, not the graphical
shape of the character. I received similar reply from
Zhe Su.

On Wed, 21 Dec 2005 00:14:32 +0800
Zhe Su <james su gmail com> wrote:
>  Just add a reference in that font to point U+E78D to U+FE10. However
>AFAIK, GB18030-2000 standard won't be changed anymore, U+E78D<->GBA6D9
>is still there. And U+FE10 is mapped to GB84318236.

I (mis-)understood as if you had an idea to write
"U+FE10 is right codepoint, and U+E78D is not right
but aliased to right codepoint" into font file.

I think you're saying as:
	if we have a TTF carefully constructured,
	we can load the glyph shape of U+FE10 from the codepoint U+E78D,
	without any external database for code-conversion.
And, I concluded as P2. I think Zhe Su's idea is same - just
"we can load same glyph shape".

And, I think, there's no automatic method to check
whether a TTF includes such cmap (has PUA only? has
both of PUA and right codepoint? shares same glyph?).
So, I concluded as P3.

Arne had ever said "automation is not important in
this issue".

On Tue, 20 Dec 2005 14:07:34 +0800
"Arne G tje" <arne linux org tw> wrote:
>> font-config to manage the fonts anymore, because font-config
>> cannot detect suitable font automatically. Is it small problem?
>If a user has multiple fonts installed which all use the PUA area for 
>different purposes, he/she will have to pick the font manually. 
>fontconfig _cannot_ know which of the fonts the user wants to use in 
>which situation.

On the other hand, Zhe Su had ever written as:

On Tue, 20 Dec 2005 23:54:36 +0800
Zhe Su <james su gmail com> wrote:
>I think fontconfig can handle that kind of fonts correctly.
>Otherwise we need to fix fontconfig as well.
>And most users are smart enough to choose correct font here.
>So IMO we don't need to care about the font issue here.

But, now I think there's no idea to check whether
a TTF returns the glyph shape of right codepoint
for PUA (as P3). So I'm afraid Zhe Su said about
something different (e.g. whether font-config can
handle PUA codepoint transparently ...?)

At last, again code-conversion issue.

Chia-I Wu, you wrote:

On Fri, 23 Dec 2005 10:22:20 +0800
Chia-I Wu <b90201047 ntu edu tw> wrote:
>> when we are asked to display U+E78D and search for
>> "better" codepoint, how do we know U+FE10 is candidate?

>This is another and big problem, not related to fonts.  You can imagine,
>after a system switches to unicode 4.1, you will hear complaints such as
>"I can see the characters (in a document or of a filename).  But when I
>search, it just don't match!"

>IMHO, we can have a script to convert all filenames, at the post-install
>time on package upgrade.  As for the documents, users should manually
>convert them, if he/she needs searchability.

I think it's still font-related issue. The utilization
of PUA codepoint is induced by the font. The "displayable"
PUA codepoint is determined by the font (or, it's determined
by some national standards? if so, please let me know),
and, as I concluded P3, "undisplayable" codepoints are out
of the scope, such codepoints should not be touched.
Therefore, I think, the code conversion script should be
maintained and updated with the refered font, synchronously.
It's bad idea (if we cannot generate the script from theu
font file automatically).


In closing, I have new question. At present, Pango scan
multiple fonts to load a glyph for given Unicode codepoints,
and its scan-order is determined by a few macroscopic
environmental variables LANG/LC_CTYPE and CHARSET.

How the scan-order for codepoints in PUA should be?

O1: same with the scan-order for public codepoints.

O2: (can) differ with the scan-order for public codepoints,
    but common for all codepoints in all PUA planes.

O3: (can be) specified independently for each PUA planes.

O4: both of the codepoint area and the scan-order for the area
    can be defined manually.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]