OpenType tables and Hebrew renderer
- From: Dov Grobgeld <dov imagic weizmann ac il>
- To: gtk-i18n-list gnome org
- Subject: OpenType tables and Hebrew renderer
- Date: Mon, 3 May 2004 22:14:56 IDT
(I'm resending as the last message was blocked because the screenshot
was too big. I have instead moved it to:
http://imagic.weizmann.ac.il/~dov/Hebrew/bereshit-teamim.png
)
The last couple of days I have tried to implement support for OpenType
tables for the Hebrew renderer. After looking around the net I found a
great Hebrew font especially tailored for biblical Hebrew texts, see:
http://www.sbl-site.org/Resources/default.aspx
It is only free for non-commercial use, but it fullfilled the purpose
of testing my module. It contains both GSUB and GPOS tables. Not only
that, but it comes with a great user manual, that shows various
problems in the layouting and how it has solved them within the font.
It also describes past and present problems that UniScribe has had
with the font.
As suggested by Owen, I copied the arabic-fc.c file and tried to
modify it for Hebrew. But I encountered some questions and problems
that I would like to have sorted out:
1. What exactly is the meaning of the property_bit in the calls
pango_ot_ruleset_add_feature()? They appear to be classifications
that are given in arabic-ot.[ch] describing whether a character
is in isolated, final, initial, or medial position. But what
confuse me is that the contents of the property is defined in
the local h file arabic-ot.h . Does it mean that the shaper does
whatever private use of it that it needs? Are the properties at
all relevant for a language like Hebrew that doesn't use any
of the previous four classifications?
2. In arabic-fc.c the variable 'cluster' is initialized to 0 for
each character:
p = text;
for (i=0; i < n_chars; i++)
{
:
int cluster = 0;
:
and then only overridden if the wc found at character i is not
a NON_SPACING_MARK in:
if (g_unichar_type (wc) != G_UNICODE_NON_SPACING_MARK)
cluster = p - text;
but leave cluster as 0 for NON_SPACING_MARKS in the call to:
pango_ot_buffer_add_glyph (buffer, index,
properties[i], cluster);
which can't be right. While I retained this condition the bidi
direction properties of the characters was screwed up. E.g.
the following string:
const gchar text[] = { 0xd7, 0x9e, 0xd7, 0xa6, 0xd6, 0xb9, 0xd7,
0x95, 0xd7, 0xaa, 0 };
wc[i] glyph->log_clusters[i]
U+05de 8
U+05e6 6
U+05b9 0
U+05d5 2
U+05ea 0
Indeed U+5b9 is a nonspacing mark, but 0 for its cluster is wrong.
It should be 2. My guess is that the problem is simply that the
initialization of cluster should be outside the for(i=...) loop?
3. Question. In the manual of the SBL font it sais that for biblical
Hebrew it is critical that there is no normalization is performed
prior to rendering and querying of the GSUB and GPOS tables.
So I wonder, does pango or freetype do any normalization of the
character order within the glyph?
4. I have one critical rendering error. The Hebrew NUN was
turned into an inverted NUN. According to the font manual this
should occur as the result of the following GSUB entry:
NUN (U+05E0) + Combining Grapheme Joiner (U+034F)
-> inverted NUN
Do you have any idea of where the bug may be? The substitution
occurs for any NUN inserted in the buffer. Disabling GSUB is
no good as it is needed in lots of cases...
5. The font manual describes that the ZWJ (U+200D) and ZWNJ (U+200C)
are used in ligature pairs in certain difficult situation. Since
that is the case, I can no longer use the following if() from the
arabic module:
if (wc >= 0x200B && wc <= 0x200F) /* Zero-width characters */
{
pango_ot_buffer_add_glyph (buffer, 0, properties[i], p - text);
}
If I erase this, will I get a glyph painted for these characters?
Why is this call at all necessary?
6. If the font does not have any GPOS table, then it is necessary to
use my current fallback positioning logics. But what API function
would I use to perform this question?
I'm spending too much time on this, but I have to admit it is quite fun. :-)
I'm adding a small screenshot taken with testtext of the first verses
of the Bible including nikud and cantillation marks. Those familiar
with Hebrew will see the wrongly rendered inverted NUN e.g. on the
fourth line from the top.
Regards,
Dov
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]