Todo list for Pango to support UVS?



Dear Behdad,

Do you have something like TODO list for Pango to support
UVS? Recently Japanese JTC1/SC2/WG2 delegates decided to
register many glyphic variations to Unicode IVD/IVS, to
be used in e-government equipements, so the interest "can
we use IVD/IVS via GTK+ based application?" arises quickly
in Japan.

You had already designed harfbuzz-ng to take UVS, but I'm
unfamiliar with the status of Pango side. Briefly I've checked
the sources and I think the following works are required.
Yet I'm not understanding the detail of your future plan
for Pango, so some of following list would be irrelevant.
Please give me comment.

1) The extention of Pango APIs supporting UVS

1-a) function
The public header files like pangoxft.h, pangofc-font.h
and pangofc-decoder.h define the functions to get a glyph
index from single Unicode codepint. Although some of them
are recognized as legacy, to keep binary compatibility,
they should not be modified, new functions taking UVS in
their arguments should be added.

1-b) class
Also, the class definition is arguable.
PangoFcFontClass/PangoFcDecoderClass have several methods
(lock/unlock face, check character availability, get glyph,
etc), and there is no function taking UVS. To keep binary
compatibility, I should not change get_glyph() method.

Thus... I should add new method taking UVS? Fortunately,
PangoFcFontClass and PangoFcDecoderClass have 4 reserved
slots for the future extensions. Can I use them for new
methods, something like, get_glyph_with_uvs()?

Or, to harmonize Pango with new harfbuzz, total refactoring
of PangoFcFontClass/PangoFcDecoderClass is scheduled?

2) The itemization of Pango to reflect UVS

When the shaping engine invokes font driver, the shaping
engine split the original UTF-8 string into the "item"
(substring without changing the script and the character
width) and pass each item to the modularized engines,
like arabic, basic, hangul, ...

At present, Pango itemizer breaks the original UTF-8 string
between the base character (CJK Unified Ideograph) and UVS,
and they are dealt as different items. Thus, UVS is always
visibly rendered. For the font driving engines, the pair of
the base character and following UVS is needed. The itemizer
should be improved to prevent the breaking between the base
character and following UVS.

# BTW, yet I've not checked how most combining characters
# are dealt.

3) The cache of cmap (from Unicode codepoint to glyph index) 

At present, PangoFcFont object has its own cmap cache,
typed PangoFcCmapCache. This is not exposed publicly,
so it is possible to extend the cache to support UVS.
However, current PangoFcCmapCache is fixed size, regardless
with the type of fonts. Today, most of the fonts are without
UVS, so adding the fixed size storage to handle UVS would
be useless memory consumation. If we expect the situation
that the major part of the text are without UVS, and only
a part of the text are decorated with UVS, it might be
reasonable starting point that the glyph index for the
pair of the base character and UVS are uncached.

# BTW, fontconfig is expected to provide the info about
# UVS availability?

4) Survey for Win32 and Mac OS X backend

Is it needed to change the public APIs of the Pango?

Regards,
mpsuzuki


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]