Re: OpenType tables and Hebrew renderer

From: Owen Taylor <otaylor redhat com>
To: Dov Grobgeld <dov imagic weizmann ac il>
Cc: gtk-i18n-list gnome org
Subject: Re: OpenType tables and Hebrew renderer
Date: Tue, 18 May 2004 11:54:55 -0400
On Mon, 2004-05-03 at 18:14, Dov Grobgeld wrote:

> As suggested by Owen, I copied the arabic-fc.c file and tried to
> modify it for Hebrew. But I encountered some questions and problems
> that I would like to have sorted out:
> 
> 1. What exactly is the meaning of the property_bit in the calls
>    pango_ot_ruleset_add_feature()? They appear to be classifications 
>    that are given in arabic-ot.[ch] describing whether a character
>    is in isolated, final, initial, or medial position. But what 
>    confuse me is that the contents of the property is defined in 
>    the local h file arabic-ot.h . Does it mean that the shaper does
>    whatever private use of it that it needs? Are the properties at
>    all relevant for a language like Hebrew that doesn't use any 
>    of the previous four classifications?

The idea of the property bits is that some features only should
be applied to certain glyphs. pango_ot_buffer_add_glyph()
has a 'guint properties' argument, which is the properties
to *disable*. If the bit for the feature is *not* set in that 
field, then the feature gets applied to that glyph.

> 2. In arabic-fc.c the variable 'cluster' is initialized to 0 for
>    each character:
>    
>       p = text;
>       for (i=0; i < n_chars; i++)
>         {
> 	  :
>           int cluster = 0;
> 	  :
> 	  
>     and then only overridden if the wc found at character i is not 
>     a NON_SPACING_MARK in:
>     
>          if (g_unichar_type (wc) != G_UNICODE_NON_SPACING_MARK)
>            cluster = p - text;
> 	   
>     but leave cluster as 0 for NON_SPACING_MARKS in the call to:
>     
>          pango_ot_buffer_add_glyph (buffer, index,
>                                     properties[i], cluster);
> 				    
>     which can't be right. While I retained this condition the bidi
>     direction properties of the characters was screwed up. E.g.
>     the following string:
>     
>       const gchar text[] = { 0xd7, 0x9e, 0xd7, 0xa6, 0xd6, 0xb9, 0xd7, 
>                              0x95, 0xd7, 0xaa, 0 };
>  
>       wc[i]    glyph->log_clusters[i]
>  
>       U+05de         8
>       U+05e6         6
>       U+05b9         0
>       U+05d5         2
>       U+05ea         0
>       
>     Indeed U+5b9 is a nonspacing mark, but 0 for its cluster is wrong.
>     It should be 2. My guess is that the problem is simply that the
>     initialization of cluster should be outside the for(i=...) loop?

Not sure offhand without looking at the code. Can you file a bug.

> 3. Question. In the manual of the SBL font it sais that for biblical 
>    Hebrew it is critical that there is no normalization is performed
>    prior to rendering and querying of the GSUB and GPOS tables.
>    So I wonder, does pango or freetype do any normalization of the
>    character order within the glyph?

If the font depends on information that is erased by normalization,
then it is simply buggy. Pango doesn't do any normalization currently,
but it may in the future. Rendering is supposed to be invariant
with respect to normalization form.

If the font simply requires composed or decomposed text, then
the shaper can do that. (Though the GLib API's aren't really
sufficient here for doing normalization incrementally.)

> 4. I have one critical rendering error. The Hebrew NUN was
>    turned into an inverted NUN. According to the font manual this
>    should occur as the result of the following GSUB entry:
>    
>        NUN (U+05E0) + Combining Grapheme Joiner (U+034F) 
>             -> inverted NUN
> 	    
>    Do you have any idea of where the bug may be? The substitution
>    occurs for any NUN inserted in the buffer. Disabling GSUB is 
>    no good as it is needed in lots of cases...

Your just going to have to jump in and debug it.

> 5. The font manual describes that the ZWJ (U+200D) and ZWNJ (U+200C)
>    are used in ligature pairs in certain difficult situation. Since
>    that is the case, I can no longer use the following if() from the
>    arabic module:
>    
>      if (wc >= 0x200B && wc <= 0x200F) /* Zero-width characters */
>         {
>           pango_ot_buffer_add_glyph (buffer, 0, properties[i], p - text);
>         }
> 	
>    If I erase this, will I get a glyph painted for these characters?
>    Why is this call at all necessary?

If you know your font has an empty glyph for these characters, you can
just use that glyph. If your font has a printing glyph, then you
are going to have to substitute it with a 0 at some point. If it
has no glyph, then you need to add a 0 rather than the 
pango_fc_font_get_unknown_glyph() result, which will give a hex
square.

> 6. If the font does not have any GPOS table, then it is necessary to
>    use my current fallback positioning logics. But what API function
>    would I use to perform this question?

Well, pango_ot_info_find_feature() can be used to find out whether the
font has a particular feature. If you don't find any GPOS features,
then you'll need to do some positioning fixups after 
pango_ot_buffer_output().

Regards,
						Owen
Attachment: signature.asc
Description: This is a digitally signed message part
References:
- OpenType tables and Hebrew renderer
  - From: Dov Grobgeld
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]