Re: Font lookup ranges [was Re: Notes on Pango Xft backend]



Around 16 o'clock on May 29, Owen Taylor wrote:

>  a) Call FcFontSetSort() once, get a list, and then when finding
>     a (language-tag, codepoint) pair, look first for a font with
>     the language tag and the codepoint, then if that fails, 
>     look for a font without the language tag with the codepoint.

Aside from taking two passes, this will elide fonts with common codepoints 
but different language tags unless you explicitly include all fonts in the 
return from FcFontSetSort.  That seems bad.

>  b) Call FcFontSetSort() separately for each language, and somehow
>     influence the sort order; what we'd like to do is make including
>     the specified language tag have an weight:

Language tags are currently given greater weight than family names, but 
given two families which both support the requested language tag, they 
will be ordered by family name.

This algorithm is used by Mozilla and works reasonably well, as you say 
there are still some significant issues.  One important detail -- language 
tags are only used when attempting to disambiguate fonts with Han glyphs; 
in all other cases, language tags are not passed to the matching routine
at all.

I think all you need to do is pass the language tag when it indicates a 
preference for a particular Han genre and let fontconfig sort things out.

That will also continue to work (perhaps a bit better) when fontconfig 
gets CSS2 matching grafted in.

>  - Type1 fonts don't have OS/2 tables, and thus don't have FC_LANG
>     entries; I think some TrueType fonts might miss them as well.

Yes, this is a significant problem -- you end up using only TrueType fonts 
that bothered to set the language tags for all Han output.  I'm not quite 
sure what we can do about that; perhaps the font configuration mechanism 
should provide a way of adding properties to fonts at load time; the 
config file could then synthetically add language tags.

>   c) Pango adds the language tag to the pattern it feeds to 
>      FcConfigSubstitute, and fonts.conf does pattern matching magic
>      to provide a different "Sans-serif" alias for every language.
> 
> Can't say I like this too much:
> 
>  - Requires lots of careful configuration (more than just
>    slapping extra fonts into "Sans-serif".) Configuration is bad.

I don't think it requires careful configuration; the config file just 
needs to list all of the fonts that are 'sans-serif' in the definition of 
the 'sans-serif' alias.  Once placed in the pattern, the language tag will 
force the appropriate one to be selected in preference to the others, and 
the preference order in the sans-serif alias definition will refine the 
selection.

Do you think this would work:

	if (language_tag == Han language)
		PatternAddInteger (FC_LANG, language_tag)
	FcConfigSubstitute
	FcFontSetSort

with the config file containing:

	<alias>
		<family>sans-serif</family>
		<prefer>
			<family>MS Gothic</family>
			<family>AR PL KaitiM Big5</family>
			<family>Norasi</family>
			<family>Garuda</family>
			<family>Arial Unicode MS</family>
		</prefer>
	</alias>

Ask for "sans\-serif:lang=traditionalchinese" and this will pick 
"AR PL KaitiM Big5" instead of "Arial Unicode MS".

As for fonts without an OS/2 table, perhaps we could generate a heuristic 
that could guess the tag.  I'm guessing that we could probe the fonts 
Unicode coverage and guess which languages it was designed to support, 
either within the Han range or outside -- find Kana and guess Japenese, 
find Hangul and guess Korean.  Differentiating between traditional and 
simplified chinese might be possible if we could get some help from
someone more familiar with the differences.  Back that up with config file 
changes that can override the heuristic and we might have a workable 
solution.

I don't want to ignore the information in the fonts; that seems like 
throwing away valuable information.

Keith Packard        XFree86 Core Team        HP Cambridge Research Lab





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]