Re: Combining characters

From: Jungshik Shin <jshin mailaps org>
To: Anuradha Ratnaweera <ARatnaweera virtusa com>
Cc: gtk-i18n-list gnome org
Subject: Re: Combining characters
Date: Sat, 6 Sep 2003 20:05:35 +0900 (KST)

On Sat, 6 Sep 2003, Anuradha Ratnaweera wrote:

> On Sat, 2003-09-06 at 00:38, Owen Taylor wrote:
> > On Fri, 2003-09-05 at 11:51, Anuradha Ratnaweera wrote:
> > A shaper module's whole purpose in life is to map Unicode character
> > sequences to pairs of glyphs from particular fonts.
>
> IMHO, mapping unicode character sequences to ghyphs (why pairs?) from a

  You're right. It's not necessarily pairs.

> particular font is a general requirement.  Therefore, I wonder why is it
> necessary to write shaper modules for each character set.

  Simple. Because there's no mechanism in Pango by which an external
mapping file is refered to extract the mapping between a sequence of
Unicode characters and a sequence of glyphs. I guess it'll be never
written because Pango is going to drop support for X11core fonts (or the
X11 driver is going to be maintained as it is now with minimal bug fixes).

> Of course, there are languages where unicode sequences do _not_ map to
> individual glyphs, where a character set specific shaper is necessary.

  Even in that case, it might be possible to put that 'algorithm' in
an external file (or in a font as was done in X11 BDF's for Indic
scripts a few years ago. Pango once supported them, but not any more).

  However I'd not spend any more time with X11core fonts  if I
were you.  It seems like you need to do some googling and update
your knowledge of modern fonts and rendering engines.

  Your time would be better spent either helping with fixing
http://bugzilla.gnome.org/show_bug.cgi?id=101079 (Latin/Greek/Cyrillic
opentype font support) or helping adding Graphite(http://graphite.sil.org)
support to Pango. Graphite fonts (to a lesser extent, so are Apple's AAT
fonts) are a lot  smarter, more flexible, and more self-contained than
opentype fonts in that Graphite fonts work 'by themselves' (with the
minimal help of a rendering engine) while opentype fonts have to rely
on rendering engines for script-specific 'intelligence'. For instance,
even if you have an opentype font for a script X, it'd be of no use unless
Pango (or Uniscribe on MS Windows) had no knowledge of the script X and
opentype layout features for the script X used in your font.

> > If you have a font, however, that has a incorrect character map
> > that assigns ligature glyphs to random ASCII characters, you
> > will have great difficulty getting that to work within the
> > framework of the fontconfig-based backends for Pango.
>
> Therein lies the whole problem.  When a unicode character sequence
> (4001, 4010 in my example) represents a single visual glyph WHICH IS NOT
> A UNICODE CHARACTER, there is no way to assign a "correct" glyph number
> in a particular font.

  Well, you can use either PUA code points (if it's an X11core fonts
with iso10646-1 XLFD registry-encoding pairs) or whatever codepoints
you like in a registry-encoding of your making (as is done in fonts
like muleipa-1').  In case of truetype fonts, by putting nominal glyphs
at Unicode codepoints and your 'presentation form glyphs' in the PUA
(in Unicode Cmap) [1], you can get your fonts recognized as supporting
your target language by fontconfig(http://fontconfig.org). This is
not an issue at all if you make an opentype font or a Graphite font.
Not every glyph in the font need to be mapped to a Unicode character. In
modern truetype fonts(opentype, AAT or Graphite)[2], it's not uncommon
to find a number of _unmapped_ glyphs. Rendering engines work with glyph
ID's instead of 'characters' at their lowest level.

Jungshik

[1] For example, take a look at Thai shaper in Pango and
Thai truetype fonts or http://bugzilla.mozilla.org/show_bug.cgi?id=95708
for Korean example). Thai shaper map a sequence of Thai letters to a
sequence of   presentation form glyphs in U+F700 block in a context
sensitive manner. My patch for bug 95708 does the same for 'Ngulim-like'
fonts.

[2] Even 'dumb/plain' truetype fonts can have a number of unmapped glyphs
if they use 'composite glyphs'.

Follow-Ups:
- Re: Combining characters
  - From: Anuradha Ratnaweera
- More than Combining characters
  - From: Maung TunTunLwin

References:
- Combining characters
  - From: Anuradha Ratnaweera
- Re: Combining characters
  - From: Owen Taylor
- Re: Combining characters
  - From: Anuradha Ratnaweera

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]