Re: GTK internationalization, right-to-left languages




Nimrod Zimerman <zimerman@earthling.net> writes:

> 
> > > Text widgets should, in general, support *several* fonts, one for each
> > > language. I'm not certain how this can be implemented without requiring
> > > huge storage if many languages are added to gtk, however.
> > 
> > Font and language are different things. In theory, using Unicode
> > you could have a single font for all languages. On the other
> > hand, a language like Arabic may be displayed using several
> > 8-bit fonts. 
> 
> What happens to the current font families? A modest Windows machine probably
> has about 30 font families, not including localized fonts.
> I assume this won't change in the future. In theory, a font can contain
> Unicode as a whole, but in practice, how many fonts like that would we have?
> Probably just a few (because there is no real use for this kind of thing).

The font family (face) is a different axis separate from the
the selection of the multiple fonts that may be necessary to
display a single string.

X uses the concept of a "Fontset", which is a group of fonts (in
different encodings) that are used to render a string. Usually
these would be related - so a fontset might include:

 Helvetica-Roman (iso-8859-1)
 Hevvetica-Greek (iso-8859-?)
 A "gothic" (sans-serif) Japanese font that matched Helvetica visualyl.
 
> Regarding Arabic - isn't it enough to assume one Unicode font could cover
> all required chracters? As you probably know, exceptions are the one thing a
> programmer hates more than a memory corruption bug... (but, handled
> correctly, this probably shouldn't be treated as an exception at all. Huge
> tables covering the whole character are probably common enough).

Supposedly, Unicode is supposed to encode characters, not glyphs.
Lets see if I can explain the difference:

 A character represents a linguistically distinct symbol.

 A glyph is a visual symbol. A font may include several glyphs
 corresponding to the same symbol(s).

Examples:

  - The ligature combining f and l commonly found in Roman fonts
    is a distinct glyph, but not a distinct character.

  - The Arabic alphabet has 28 distinct characters, but to display
    Arabic properly requires (almost) four times as many glyphs,
    because of the variants for initial/middle/end/independent

Because Unicode combines existing character sets, it includes
some glyphs that are distinct characters (for instance, the
f-l ligature was part of iso-8859-1, so it is part of Unicode).

I don't know if Unicode encodes all the necessary glyphs for
Arabic, but in theory, it shouldn't. It should just include
the characters of iso-8859-? (which is character-based
not glyph-based)

> > To the Japanese eye, one of these character written in the Chinese
> > fashion, even though understandable, is incorrect. So
> > to correctly display a Unicode 6f22, it is not sufficient to
> > just know if it is Unicode 6f22, you also have to know whether
> > it is the Japanese 6f22 or the Chinese 6f22.
> 
> Doesn't that somewhat defeat the point of Unicode? Oh, well.
> 
> Does it really matter, or can it be ignored? (Differently put - what's the
> chance an angry Japanese would decide to bomb gtk's headquarters after using
> a utility that uses the Chinese version of the letter?).

Well, not likely. But if we didn't allow them to select the Japanese
versions, they might well decide against using GTK+.

Regards,
                                        Owen



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]