Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings])

hi Badhdad

I don't think the tone in your reply going to be helpful in any aspect toward a
solution of this problem.

I hope you understand that people raise these issues for the goods of pango. They want to make it more powerful and logical under all possible circumstances.
Second, there must be a reason for this issue being raised again and again
in the past many years. I think insufficient explanations and poor guidance for users toward a good solution play roles here (I am sorry to say that your "proof" in the last email still did not help because it was not what I was asking for).

As reading the replies in the past few days, I came to realize the key of
problem is to set up a "correct" fall-back path of the untagged
(or COMMON) text. Obviously, you are reluctant to explicitly
tag them as LATIN in pango. You may be right if differentiating
COMMON with LATIN is practically necessary (I mean "practically", not
semantically as in Unicode standard). You have your rationales

Unfortunately, the current fall-back mechanism eventually assign
the current locale info to these untagged text. And it turns out
that for some users (if not all), particularly for CJK users (where the practical differences
between Latin/Common are not significant), it created unpleasant
formating results due to the mixing of fonts.

So, it seems obvious that additional info is needed to assist the fall-back
of these untagged text to the preferred settings. This info can be introduced by the patched fontconfig, using block preference font list; or using the current
keyboard layout as suggested by Sergey and Chris. Maybe a third
way is to create a LC variable, say LC_COMMON, independent
of LC_ALL/LANG, taking care of the untagged text formating. I actually
felt that this is probably more suitable than the other two approaches.
Because this is a locale-based preference, not font or keyboard preferences
(here is just my first thought on this, I may be wrong).

In any case, I "think" I understand your argument, although there are still
details needs to be verified. But I think it will be useful if we focus on
clarifying a solution rather than arguing who is right and who is wrong.


Behdad Esfahbod wrote:
On Thu, 2007-12-20 at 04:22 +0800, Abel Cheung wrote:


My reply is followed below, inline...

So is mine.

On Dec 17, 2007 7:22 AM, Behdad Esfahbod <behdad behdad org> wrote:
[..........tons of quasi-maths ...........]
Secondly, you said that "contextual font selection" is a "cool"
feature, I am wondering what languages are beneficial from this feature?
(I believe there are, but just want to know).
Pretty much every non-Latin script.  In some situations even the Latin

Take the Unicode character U+002E FULL STOP, aka ASCII period.  It is
used in more than just Latin, in Arabic for example, in Hebrew, possibly
in Indic and many other scripts.  If it was not grouped with neighboring
characters for font selection purposes all those people would have got
their Arabic/Hebrew/... text assigned an Arabic/Hebrew/... font while
the periods in at the end of sentences assigned a different (default
Latin for example) font.

The same happens for Latin under a document tagged as non-Latin.  It's
not a luxury thing.  It's just how things are supposed to work.
That means, font change depending on context is actually preferrred in
some fonts or some langauges, is it? If that's true, then this would be
a per-language preference, some want it, some don't.

So does pango support toggling this behavior yet? (I guess not?)

What do you exactly mean by "this behavior"?  Which behavior?  Show me
the source code line.  I'm getting tired of all the hand waving.

The main font issue though, is that Chinese (Simplified, Traditional),
Korean, and Japanese share some Unicode code points, but they require
slightly different renderings.  Now if you don't tell Pango which
version is preferred, how can it know which font to choose?  It
explicitly doesn't prefer any one over the others to avoid cultural

The symptoms of this problem are "multiple fonts used in the same line".
Solution is: Either run under a CJK locale, or give hints to Pango about
your preferred CJK locale using the env var PANGO_LANGUAGE.

Note that theoretically Pango can do text analysis to come up with a
best guess, but doing that would then introduce another bug with
symptoms "changes font when typing a few characters on the same line".
Let me set the record straight here. Most people seeing this problem is not
exactly complaining about the font changing, but about the font changing TO
SOME BAD LATIN GLYPH THEY DON'T LIKE. It is understood that font changing is
almost not avoidable, since typing just a few characters may not provide enough
information on what kind of font should be picked, and typing more
gives more info.
So far it is determined per sentence, or per what?

Believe me, I know that.  And I understand it if you don't WRITE IN CAPS
too.  Does it help if I say THEN GO REMOVE THE CRAPPY FONT?

Sadly this way absolutely won't satisfy everybody -- one party only. And in
particular, the font picked is determined per glyph, causing a sentence to be
intermixed by multiple CJK fonts as described.

This is totally wrong.  Pango first tags each piece of text with a
language, then asks fontconfig to sort fonts for that language, then
uses the sorted list to assign font to each character.  That is, if you
mark your text zh_CN (by either running under that locale, or setting
PANGO_LANGUAGE to that, or otherwise marking it), and have a suitable
font for that language and if you have crappy fonts for it, have
fontconfig configured to prefer the good one, then Pango chooses the
right font.  Now all the "bugs" you show me are in all the steps
mentioned except for what Pango is doing.

What if the font determination is not chopped glyph by glyph, but also
determined heuristically with context?

Pango already does that.  That's exactly what you call "contextual"
something above and condemn.

If my guess is correct this would work most of the
cases, even among language variants (think zh_CN and zh_TW).

No.   You need to go back and read and understand my "tons of

Another symptom, "digits change font after typing character" is in fact
a very cool Pango feature, just badmouthed by the above problem.  Fix
the problem.
When a solution is not universal enough to be accepted by everybody,
and caused more trouble then its worth for specific people, it would be
badmouthed no matter what. Or not? I don't know the rule here.

You officially don't know what you are talking about.



As you see from the bug lists, this problem has existed for many
years, and I am pretty sure that it will come back again and again, as
long as the expected rendering is not achieved. If the current pango
formatting logic is not sufficient to handle the CJK preferences as
said above, I think to refine the logic to take it into consideration
is better than stick with a fixed but incomplete logic.

I consider patches improving Pango's font selection algorithm, but none
that I've seen so far had been an improvement (from my point of view).
If it has words like CJK or "special case", I'm most probably not
interested.  Of the bugs you listed, only the one I opened myself is
valid IMO.  The rest is just left open because no matter how many times
I close them, they will be reopened... Oh well.

please let me know your thoughts and reasoning on whether this is
feasible or not, if yes, where to get start.

Does the above make sense?  I understand that it's easier to apply a two
line patch to Pango instead of doing what of the things I listed above,
but that just doesn't fit in the design, and it introduces other
problems you don't see right now.

thank you for paying attention to this issue.




Bug 321113 - Wrong glyph subsituation algorithm for digital characters
and punctuations

Bug 345072 - changes font when typing different scripts on the same

Bug 345386 - Language and direction propagation in and between
PangoLayouts  (opened by yourself)

Bug 481210 - [All lang] [firefox] - Face of the number is changing
when enter number + Char, in any Locale

Bug 481188 - ascii text space too narrow for Chinese encodings

Bugzilla Bug 129541: changes font when typing different scripts on the
same line

Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango

Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox
give bad eol rendering and cursor placement (filed by Jens
Petersen) (broken link)

Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is
changing when enter number + Char, in any Locale

Bugzilla Bug 221361: [pango] ascii text space and punctuation is
narrow for CJK

Bug 379125 - chinese punctuations after english letters are wrongly


...very few phenomena can pull someone out of Deep Hack Mode, with two
noted exceptions: being struck by lightning, or worse, your *computer*
being struck by lightning.  -- Matt Welsh

gtk-i18n-list mailing list
gtk-i18n-list gnome org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]