Re: On CJK font selection (was Re: [Fwd: Re: Request for review and advice on wqy-bitmap-fonts fontconfig settings])



hi Behdad

I would have agreed with you if you clearly tell me why this change SHOULD
be done in the fonts, or in the font selection, not in the layout engine. Your previous replies, either to the bug reports or to my email, simply refused to
make this change by saying this is "technically impossible", but you do
not tell me based on what model that you made the statement. If you can
give me a diagram or document to illustrate that this is not the business of
layout engine, I would not insist to continue this discussion.

Secondly, you said that "contextual font selection" is a "cool"
feature, I am wondering what languages are beneficial from this feature? (I believe there are, but just want to know). As I said in the previous email, this creates more
troubles for CJK languages than benefits.Particularly this ruins the text
alignment in monospace environment (see attachment). I doubt anyone
see it would say "cool", rather, they would feel annoyed.

In addition, you seem to underestimate the difficulties of ripping out part of
a CJK font. This is not possible for commercial fonts. Even it is doable
for open fonts (very few choices though), the incompatibility of the resulting
fonts will make it totally unusable on most platforms.

I want to add that on Windows, CJK users had never had such a problem,
all known CJKfonts have their Latin glyphs (some are crappy), but the text
rendering are "normal" (nothing like in the attachment). How window
structures the style propagation for COMMON characters?


Qianqian



Behdad Esfahbod wrote:
Hi Qianqian,

[CC'ing to gtk-i18n-list, so hopefully this is the last time I have to
repeat this.]

On Mon, 2007-12-10 at 18:01 -0500, Qianqian Fang wrote:
Go back to the digit font change issue as we discussed earlier, I
spent some time in the past few days, trying to get myself a more
clear picture on this. I dug out some bug reports from various bugzillas
(Mozilla, Redbat, Gnome) and gathered a list of similar reports (see
the bottom of the email). These reports were filed from simplified and
traditional Chinese users and Japanese users (I believed Korean
experienced the same problem).  So, one thing that can be said from
this list is that the "contextual font selection" does seem to be
bothering CJK users in text formatting.

Yes, you have identified the problem very accurately.


I understand that "contextual shaping" is one of the techniques for
rendering complex scripts. I am not sure how tight is the connection
between "contextual shaping" and the "contextual format propagation",
but one thing that I think may put some light to the complains of the
CJK users is that Chinese (maybe Japanese as well) scripts are not
contextual sensitive. Chinese characters are relatively independent
and self-consistent in shapes (while, this statement is not true for
Chinese calligraphy, where strokes may connect between characters
depending on layout direction, but the current OSs and font
technologies are not ready to handle this IMO). The only complexities
may come from the fact that Hanzi for printing are mostly equal-width,
and the punctuations among the Hanzi are expected to match the width
of the surrounding Hanzi. As the full-width punctuations being encoded
separately by Unicode, together with the contextual punctuation
support of the input-methods, this seems to be handled very well. So,
in short, for Chinese text layout, users are generally not expected to
see contextual-based changes, either encoding/glyph or font faces
(this may not include some extreme cases).

And Pango supports those all perfectly fine.  Even vertical writing
using the correct substituted punctuation glyphs.  See:

  http://www.pango.org/ScriptGallery


The main font issue though, is that Chinese (Simplified, Traditional),
Korean, and Japanese share some Unicode code points, but they require
slightly different renderings.  Now if you don't tell Pango which
version is preferred, how can it know which font to choose?  It
explicitly doesn't prefer any one over the others to avoid cultural
problems.

The symptoms of this problem are "multiple fonts used in the same line".
Solution is: Either run under a CJK locale, or give hints to Pango about
your preferred CJK locale using the env var PANGO_LANGUAGE.

Note that theoretically Pango can do text analysis to come up with a
best guess, but doing that would then introduce another bug with
symptoms "changes font when typing a few characters on the same line".


Now go back to pango, from what I read from the bug reports, pango
uses PANGO_SCRIPT_COMMON to represent language-independent symbols. I
have no complain about that. It is a good classification based on the
semantics of the symbols.

Good.  Let me also note that there's no way to change that.  It's
hardcoded in the Unicode standard.


What I, and most CJK users, are not satisfied with is the
contextual-sensitivity of those common scripts when for mating text
under cjk locales. I know that you have advocated to stick with the
"face" meaning of SCRIPT_COMMON, which is supposedly to be rendered by
local languages. But IMO, the face meaning is misleading here. From a
Chinese user perspective, the difference between the SCRIPT_COMMON to
Latin is negligible,

Lemme correct you here, "From a Chinese user perspective, the ASCII
digits are considered Latin".  There's sure a lot more than ASCII digits
to SCRIPT_COMMON.  Helps to be precise.


compared with its difference to CJK characters. Therefore, using CJK
fonts to render SCRIPT_COMMON is quite odd. Using Latin fonts for
COMMON is most preferred; even specifying no face ( i.e. using system
fall-back) is better than assigning Chinese fonts for these scripts
for that most Chinese fonts have low-quality Latin/common glyphs, even
the commercial ones.

And this problem has a name: "crappy glyphs and multiple scripts in a
font".  Tell me about it...

I already pointed out a few solutions to it previously:

  - Rip the crap out and everyone will feel better.

  - Use TrueType containers (even for bitmap-only fonts) and put each
script's glyphs into its own face, with all faces having the same name
and put into the same TrueType Collection file.

  - Finish patch for fontconfig to allow configuration to disable
certain Unicode codepoints per font.  The write such configuration for
the crappy glyphs.

Pick whichever you prefer and just do it.


Another symptom, "digits change font after typing character" is in fact
a very cool Pango feature, just badmouthed by the above problem.  Fix
the problem.


As you see from the bug lists, this problem has existed for many
years, and I am pretty sure that it will come back again and again, as
long as the expected rendering is not achieved. If the current pango
formatting logic is not sufficient to handle the CJK preferences as
said above, I think to refine the logic to take it into consideration
is better than stick with a fixed but incomplete logic.

I consider patches improving Pango's font selection algorithm, but none
that I've seen so far had been an improvement (from my point of view).
If it has words like CJK or "special case", I'm most probably not
interested.  Of the bugs you listed, only the one I opened myself is
valid IMO.  The rest is just left open because no matter how many times
I close them, they will be reopened... Oh well.


please let me know your thoughts and reasoning on whether this is
feasible or not, if yes, where to get start.

Does the above make sense?  I understand that it's easier to apply a two
line patch to Pango instead of doing what of the things I listed above,
but that just doesn't fit in the design, and it introduces other
problems you don't see right now.


thank you for paying attention to this issue.

Qianqian

Regards,

behdad


=============================================================== Bug 321113 - Wrong glyph subsituation algorithm for digital characters
and punctuations
http://bugzilla.gnome.org/show_bug.cgi?id=321113


Bug 345072 - changes font when typing different scripts on the same
line http://bugzilla.gnome.org/show_bug.cgi?id=345072


Bug 345386 - Language and direction propagation in and between
PangoLayouts
http://bugzilla.gnome.org/show_bug.cgi?id=345386  (opened by yourself)
https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=103679


Bug 481210 - [All lang] [firefox] - Face of the number is changing
when enter number + Char, in any Locale
http://bugzilla.gnome.org/show_bug.cgi?id=481210


Bug 481188 - ascii text space too narrow for Chinese encodings
http://bugzilla.gnome.org/show_bug.cgi?id=481188


Bugzilla Bug 129541: changes font when typing different scripts on the
same line https://bugzilla.redhat.com/show_bug.cgi?id=129541


Bugzilla Bug 131218: [RHEL4] Characters get truncated in new pango
https://bugzilla.redhat.com/show_bug.cgi?id=131218


Bugzilla Bug 149991: [CJK pango] digits and punctuation in textbox
give bad eol rendering and cursor placement
https://bugzilla.redhat.com/show_bug.cgi?id=149991 (filed by Jens
Petersen)


https://bugzilla.redhat.com/show_bug.cgi?id=220885 (broken link)


Bugzilla Bug 228804: [All lang] [firefox] - Face of the number is
changing when enter number + Char, in any Locale
https://bugzilla.redhat.com/show_bug.cgi?id=228804


Bugzilla Bug 221361: [pango] ascii text space and punctuation is
narrow for CJK
https://bugzilla.redhat.com/show_bug.cgi?id=221361


Bug 379125 - chinese punctuations after english letters are wrongly
displayed
https://bugzilla.mozilla.org/show_bug.cgi?id=379125
https://bugzilla.mozilla.org/attachment.cgi?id=263185
===============================================================


PNG image



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]