OpenType tables and Hebrew renderer

From: Dov Grobgeld <dov imagic weizmann ac il>
To: gtk-i18n-list gnome org
Subject: OpenType tables and Hebrew renderer
Date: Mon, 3 May 2004 22:14:56 IDT
(I'm resending as the last message was blocked because the screenshot
was too big. I have instead moved it to:

    http://imagic.weizmann.ac.il/~dov/Hebrew/bereshit-teamim.png
)

The last couple of days I have tried to implement support for OpenType
tables for the Hebrew renderer. After looking around the net I found a
great Hebrew font especially tailored for biblical Hebrew texts, see:

    http://www.sbl-site.org/Resources/default.aspx
    
It is only free for non-commercial use, but it fullfilled the purpose
of testing my module. It contains both GSUB and GPOS tables. Not only
that, but it comes with a great user manual, that shows various
problems in the layouting and how it has solved them within the font.
It also describes past and present problems that UniScribe has had
with the font.

As suggested by Owen, I copied the arabic-fc.c file and tried to
modify it for Hebrew. But I encountered some questions and problems
that I would like to have sorted out:

1. What exactly is the meaning of the property_bit in the calls
   pango_ot_ruleset_add_feature()? They appear to be classifications 
   that are given in arabic-ot.[ch] describing whether a character
   is in isolated, final, initial, or medial position. But what 
   confuse me is that the contents of the property is defined in 
   the local h file arabic-ot.h . Does it mean that the shaper does
   whatever private use of it that it needs? Are the properties at
   all relevant for a language like Hebrew that doesn't use any 
   of the previous four classifications?
   
2. In arabic-fc.c the variable 'cluster' is initialized to 0 for
   each character:
   
      p = text;
      for (i=0; i < n_chars; i++)
        {
	  :
          int cluster = 0;
	  :
	  
    and then only overridden if the wc found at character i is not 
    a NON_SPACING_MARK in:
    
         if (g_unichar_type (wc) != G_UNICODE_NON_SPACING_MARK)
           cluster = p - text;
	   
    but leave cluster as 0 for NON_SPACING_MARKS in the call to:
    
         pango_ot_buffer_add_glyph (buffer, index,
                                    properties[i], cluster);
				    
    which can't be right. While I retained this condition the bidi
    direction properties of the characters was screwed up. E.g.
    the following string:
    
      const gchar text[] = { 0xd7, 0x9e, 0xd7, 0xa6, 0xd6, 0xb9, 0xd7, 
                             0x95, 0xd7, 0xaa, 0 };
 
      wc[i]    glyph->log_clusters[i]
 
      U+05de         8
      U+05e6         6
      U+05b9         0
      U+05d5         2
      U+05ea         0
      
    Indeed U+5b9 is a nonspacing mark, but 0 for its cluster is wrong.
    It should be 2. My guess is that the problem is simply that the
    initialization of cluster should be outside the for(i=...) loop?

3. Question. In the manual of the SBL font it sais that for biblical 
   Hebrew it is critical that there is no normalization is performed
   prior to rendering and querying of the GSUB and GPOS tables.
   So I wonder, does pango or freetype do any normalization of the
   character order within the glyph?
   
4. I have one critical rendering error. The Hebrew NUN was
   turned into an inverted NUN. According to the font manual this
   should occur as the result of the following GSUB entry:
   
       NUN (U+05E0) + Combining Grapheme Joiner (U+034F) 
            -> inverted NUN
	    
   Do you have any idea of where the bug may be? The substitution
   occurs for any NUN inserted in the buffer. Disabling GSUB is 
   no good as it is needed in lots of cases...
   
5. The font manual describes that the ZWJ (U+200D) and ZWNJ (U+200C)
   are used in ligature pairs in certain difficult situation. Since
   that is the case, I can no longer use the following if() from the
   arabic module:
   
     if (wc >= 0x200B && wc <= 0x200F) /* Zero-width characters */
        {
          pango_ot_buffer_add_glyph (buffer, 0, properties[i], p - text);
        }
	
   If I erase this, will I get a glyph painted for these characters?
   Why is this call at all necessary?

6. If the font does not have any GPOS table, then it is necessary to
   use my current fallback positioning logics. But what API function
   would I use to perform this question?
   
I'm spending too much time on this, but I have to admit it is quite fun. :-)

I'm adding a small screenshot taken with testtext of the first verses
of the Bible including nikud and cantillation marks. Those familiar
with Hebrew will see the wrongly rendered inverted NUN e.g. on the
fourth line from the top.

Regards,
Dov
Follow-Ups:
- Re: OpenType tables and Hebrew renderer
  - From: Owen Taylor
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]