Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings])



Hi,

My reply is followed below, inline...


On Dec 17, 2007 7:22 AM, Behdad Esfahbod <behdad behdad org> wrote:
[..........tons of quasi-maths ...........]
>
> > Secondly, you said that "contextual font selection" is a "cool"
> > feature, I am wondering what languages are beneficial from this feature?
> > (I believe there are, but just want to know).
>
> Pretty much every non-Latin script.  In some situations even the Latin
> script.
>
> Take the Unicode character U+002E FULL STOP, aka ASCII period.  It is
> used in more than just Latin, in Arabic for example, in Hebrew, possibly
> in Indic and many other scripts.  If it was not grouped with neighboring
> characters for font selection purposes all those people would have got
> their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while
> the periods in at the end of sentences assigned a different (default
> Latin for example) font.
>
> The same happens for Latin under a document tagged as non-Latin.  It's
> not a luxury thing.  It's just how things are supposed to work.

That means, font change depending on context is actually preferrred in
some fonts or some langauges, is it? If that's true, then this would be
a per-language preference, some want it, some don't.

So does pango support toggling this behavior yet? (I guess not?) And if
not, would it be planned in future release?



> > > The main font issue though, is that Chinese (Simplified, Traditional),
> > > Korean, and Japanese share some Unicode code points, but they require
> > > slightly different renderings.  Now if you don't tell Pango which
> > > version is preferred, how can it know which font to choose?  It
> > > explicitly doesn't prefer any one over the others to avoid cultural
> > > problems.
> > >
> > > The symptoms of this problem are "multiple fonts used in the same line".
> > > Solution is: Either run under a CJK locale, or give hints to Pango about
> > > your preferred CJK locale using the env var PANGO_LANGUAGE.
> > >
> > > Note that theoretically Pango can do text analysis to come up with a
> > > best guess, but doing that would then introduce another bug with
> > > symptoms "changes font when typing a few characters on the same line".

Let me set the record straight here. Most people seeing this problem is not
exactly complaining about the font changing, but about the font changing TO
SOME BAD LATIN GLYPH THEY DON'T LIKE. It is understood that font changing is
almost not avoidable, since typing just a few characters may not provide enough
information on what kind of font should be picked, and typing more
gives more info.
So far it is determined per sentence, or per what?

If this behavior can be toggled on a per-language basis, CJK users would not
complain _THAT_ much. And note that this problem describes mixing CJK text
with latin text. Move on.

The second one: "multiple fonts used in the same line". Exactly because
pango doesn't heuristically determine the font, so everything is handed off to
fontconfig. And fontconfig sets up a ladder, and whatever font on the highest
position that fits for certain glyph is picked. This is my understanding so far,
please correct me if I'm wrong.

Sadly this way absolutely won't satisfy everybody -- one party only. And in
particular, the font picked is determined per glyph, causing a sentence to be
intermixed by multiple CJK fonts as described.

What if the font determination is not chopped glyph by glyph, but also
determined
heuristically with context? If my guess is correct this would work most of the
cases, even among language variants (think zh_CN and zh_TW). Except in
one situation: when text from multiple language are used within single sentence
(like introduction of foreign language), in which one font is chosen
for the largest
chunk of text fitting one single language, and another font for
another language.

Solutions don't necessarily contradict with each other: one is talking about
mixing CJK text with latin text, and another is talking about intermixing of
CJK text among themselves.


> > > Another symptom, "digits change font after typing character" is in fact
> > > a very cool Pango feature, just badmouthed by the above problem.  Fix
> > > the problem.

When a solution is not universal enough to be accepted by everybody,
and caused more trouble then its worth for specific people, it would be
badmouthed no matter what. Or not? I don't know the rule here.

Abel


> > >
> > >
> > >
> > >> As you see from the bug lists, this problem has existed for many
> > >> years, and I am pretty sure that it will come back again and again, as
> > >> long as the expected rendering is not achieved. If the current pango
> > >> formatting logic is not sufficient to handle the CJK preferences as
> > >> said above, I think to refine the logic to take it into consideration
> > >> is better than stick with a fixed but incomplete logic.
> > >>
> > >
> > > I consider patches improving Pango's font selection algorithm, but none
> > > that I've seen so far had been an improvement (from my point of view).
> > > If it has words like CJK or "special case", I'm most probably not
> > > interested.  Of the bugs you listed, only the one I opened myself is
> > > valid IMO.  The rest is just left open because no matter how many times
> > > I close them, they will be reopened... Oh well.
> > >
> > >
> > >
> > >> please let me know your thoughts and reasoning on whether this is
> > >> feasible or not, if yes, where to get start.
> > >>
> > >
> > > Does the above make sense?  I understand that it's easier to apply a two
> > > line patch to Pango instead of doing what of the things I listed above,
> > > but that just doesn't fit in the design, and it introduces other
> > > problems you don't see right now.
> > >
> > >
> > >
> > >> thank you for paying attention to this issue.
> > >>
> > >> Qianqian
> > >>
> > >
> > > Regards,
> > >
> > > behdad
> > >
> > >
> > >
> > >> ===============================================================
> > >> Bug 321113 - Wrong glyph subsituation algorithm for digital characters
> > >> and punctuations
> > >> http://bugzilla.gnome.org/show_bug.cgi?id=321113
> > >>
> > >>
> > >> Bug 345072 - changes font when typing different scripts on the same
> > >> line
> > >> http://bugzilla.gnome.org/show_bug.cgi?id=345072
> > >>
> > >>
> > >> Bug 345386 - Language and direction propagation in and between
> > >> PangoLayouts
> > >> http://bugzilla.gnome.org/show_bug.cgi?id=345386  (opened by yourself)
> > >> https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679
> > >>
> > >>
> > >> Bug 481210 - [All lang] [firefox] - Face of the number is changing
> > >> when enter number + Char, in any Locale
> > >> http://bugzilla.gnome.org/show_bug.cgi?id=481210
> > >>
> > >>
> > >> Bug 481188 - ascii text space too narrow for Chinese encodings
> > >> http://bugzilla.gnome.org/show_bug.cgi?id=481188
> > >>
> > >>
> > >> Bugzilla Bug 129541: changes font when typing different scripts on the
> > >> same line
> > >> https://bugzilla.redhat.com/show_bug.cgi?id=129541
> > >>
> > >>
> > >> Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango
> > >> https://bugzilla.redhat.com/show_bug.cgi?id=131218
> > >>
> > >>
> > >> Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox
> > >> give bad eol rendering and cursor placement
> > >> https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens
> > >> Petersen)
> > >>
> > >>
> > >> https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link)
> > >>
> > >>
> > >> Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is
> > >> changing when enter number + Char, in any Locale
> > >> https://bugzilla.redhat.com/show_bug.cgi?id=228804
> > >>
> > >>
> > >> Bugzilla Bug 221361: [pango] ascii text space and punctuation is
> > >> narrow for CJK
> > >> https://bugzilla.redhat.com/show_bug.cgi?id=221361
> > >>
> > >>
> > >> Bug 379125 - chinese punctuations after english letters are wrongly
> > >> displayed
> > >> https://bugzilla.mozilla.org/show_bug.cgi?id=379125
> > >> https://bugzilla.mozilla.org/attachment.cgi?id=263185
> > >> ===============================================================
> > >>
> > >
> > >
> >
> --
> behdad
> http://behdad.org/
>
> ...very few phenomena can pull someone out of Deep Hack Mode, with two
> noted exceptions: being struck by lightning, or worse, your *computer*
> being struck by lightning.  -- Matt Welsh
>
> _______________________________________________
> gtk-i18n-list mailing list
> gtk-i18n-list gnome org
> http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
>



-- 
Abel Cheung   (GPG Key: 0xC67186FF)
Key fingerprint: 671C C7AE EFB5 110C D6D1  41EE 4152 E1F1 C671 86FF
--------------------------------------------------------------------
* My own cave: http://me.abelcheung.org/
* Opensource Application Knowledge Assoc. - http://oaka.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]