Re: Chinese Traditional appearance -- mixed weights?



On Sat, 2006-09-30 at 13:35 +0800, Greg Aumann wrote:
> Owen Taylor wrote:
> > Do you think that Pango should do frequency analysis of incoming text
> > to guess the particular Han-using language it is in (allowing for the
> > possibility of multiple Han-using languages in a single Paragraph, 
> > also allowing for paragraphs that are too short to do any such
> > analysis)?
> > 
> of course not. As you obviously expect.
> 
> > If not, then you have two possibilities:
> >
> > A) Provide Pango with the information about the language you are
> >    displaying (via the locale environment variables, or via markup...
> 
> I tried setting the locale to different values to see what the effect
> would be. If I use zh_CN.UTF-8 then it uses the same Chinese font for
> all the text regardless of being mono, sans or serif. I don't get the
> mixing of two fonts.
>
>  If I use zh_TW.UTF-8 I get mixing of the same two
> fonts as the English locale but I get a lot more of the "italic" looking
> one and much less of the "sans serif" looking one. A Japanese locale
> gives me the same results as an English one. So obviously setting the
> language makes a difference. That was something that I didn't realise.
> But setting the language to a Han locale doesn't necessarily solve the
> mixing fonts problem. In fact if I use a Taiwanese locale the text is
> more simplified characters than if I use an English locale!

I don't really have enough familiarity with Chinese to sort out
"simplified only" from "traditional only" characters visually. But from
what you describe:

 zh-cn: consistent font
 ja-jp: mixed fonts
 en-us: mixed fonts
 zh-tw: mixed fonts

It really sounds like you have a simplified Chinese document. The
difference between ja-jp/en-us and zh-tw is likely because the first
Han font in your fontconfig configuration is a Japanese font.

> How does pango determine which fonts to use for different languages? I
> can't see anything language specific in fonts.conf.

The supported languages of a font are determined automatically by
fontconfig based on the codepoints included in the font; that is
then one factor that is taken into account by fontconfig's font
selection algorithm

The determination of the language tag is done, on the other hand
by Pango. Pango first determines a script for each character in
the text it is rendering, and also determines a first-pass language
tag by looking at what was specified by the application, and if
nothing was specified, by the locale. (*)

It then knows what languages are compatible with what scripts (That is,
it know that text in the Arabic script can't be English) and uses
that information to "refine" the language tags. If it finds a language
tag that isn't plausible it replaces it with:

 - A default language tag for the script when there is something
   obvious and more-or-less uncontroversial. (Arabic for the Arabic
   script, Greek for the Greek script)
 - No language tag if an appropriate language tag can't be guessed.
   (Han script)

(This language tag refinement process is mainly meant to deal with
pathologies  of the fontconfig font selection scheme that have been
solved in the most recent versions of fontconfig, so possibly will be
dropped in future versions of Pango)

> Also how can I untangle the effects of the font sets? I want to figure
> out which glyphs are coming from which fonts so I can adjust fonts.conf
> properly i.e. also understand what is going on. When I change the font
> in gedit or gcharmap to a specific one it still is bringing in glyphs
> from other fonts. Is going through all my font files in fontforge and
> comparing appearances the only way?

The font selection algorithm for older versions of fontconfig is:

 1) Explicitly specified non-generic fonts, in the order specified
   (generic fonts being things like "sans-serif", "serif", "monospace")

 2) Fonts from the generic in use matching the language tag of the text 
   sorted by the order that those fonts appear in the generic.
   The generic in use is determined in  priority order:

     - If one was given explicitly, that generic
     - If a explicitly specified font is recognized as belong to
       a generic class, that generic. (So, Arial pulls in sans-serif,
       Courier, serif)
     - Otherwise, sans-serif
           
 3) Other fonts matching the language tag, in random order.

 4) Other fonts from the generic in use.

 5) Other fonts in random order.

If the language tag refinement scheme ended up with no language tag,
then 2) and 3) are skipped... this is what is going on when your
language is en-US.

The fix in the most recent versions of fontconfig referred to above
is that 2) and 3) are replaced with:

 2') The best font that matches the language tag of the text chosen as
     the first language-tag matching font of:

       - Fonts from the generic use, in the order specified 
       - Other fonts in random order

The advantage of that is that once we know what font we prefer for,
say, Greek, a language tag of Greek won't then won't reorder Russian
fonts to prefer fonts that also happen to cover Greek.

Sorry for the excessive level of detail here; I couldn't figure out
a good place to stop and simplify. Though, trust me, there's still
a fair a bit of complexity I *didn't* get to above. :-)

						Owen

(*) Not quite true, but GTK+ always specifies the locale as a fallback
    language tag, so effectively that's what you'll see for using
    Pango through GTK+.

Attachment: signature.asc
Description: This is a digitally signed message part



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]