Re: Multi-pass GSUB?

From: Theppitak Karoonboonyanan <thep linux thai net>
To: Owen Taylor <otaylor redhat com>
Cc: gtk-i18n-list gnome org
Subject: Re: Multi-pass GSUB?
Date: Fri, 1 Oct 2004 11:04:20 +0700
On Thu, Sep 30, 2004 at 09:06:07AM -0400, Owen Taylor wrote:
> On Thu, 2004-09-30 at 10:19 +0700, Theppitak Karoonboonyanan wrote:
> > On Wed, Sep 29, 2004 at 11:16:18AM -0400, Owen Taylor wrote:
> > > pango_ot_ruleset_subsitute() applies each lookup in the font separately
> > > one after each other.
> > > 
> > > The ordering that lookups are applied is:
> > > 
> > >  A) First by order that features are added by the shaper
> > >  B) Within each feature, by the order of lookups for the feature
> > >     specified by the font.
> > > 
> > > Are you sure that the font and the shaper are set up to give the
> > > correct ordering for your font? 
> > > 
> > > The Microsoft Thai OpenType spec doesn't specify any features for
> > > reordering (*), but generally the 'ccmp' feature used for decomposition
> > > is always the first feature applied, so any other features should
> > > be after this,.
> > 
> > Umm.. Actually, the reordering rules were cheats, by exploiting the
> > chain sub. For example, to reorder "x y" to "y x", I assigned y as a
> > simple substitution of x, and vice versa. Then, the chain sub rules knew
> > how to select substitution based on context. Fortunately, the number of
> > such combinations were small enough to list all.
> 
> I'm not sure how this is relevant... Features are named groups of 
> lookups that the opentype engine runs in the order specified by
> the shaper. I was wondering how your font sets up these rules.
> 
> (Multiple substitution would seem like a more reliable way of doing
> things)
> 
> > Regarding the rule ordering, could you suggest a reliable way to check?
> > I have used ttx to dump the font and found that the decomposition comes
> > first as you explained.
> 
> Are you sure you are looking at the order they are specified in the
> feature, not the order they are specified in the global lookup
> list.

Yes, I did look at the 'ccmp' feature. But I might be confused by the
"index" and "value" properties:

    <FeatureList>
      <!-- FeatureCount=2 -->
      <FeatureRecord index="0">
        <FeatureTag value=" RQD"/>
        <Feature>
          <!-- LookupCount=1 -->
          <LookupListIndex index="0" value="8"/>
        </Feature>
      </FeatureRecord>
      <FeatureRecord index="1">
        <FeatureTag value="ccmp"/>
        <Feature>
          <!-- LookupCount=8 -->
          <LookupListIndex index="0" value="7"/>
          <LookupListIndex index="1" value="6"/>
          <LookupListIndex index="2" value="5"/>
          <LookupListIndex index="3" value="4"/>
          <LookupListIndex index="4" value="3"/>
          <LookupListIndex index="5" value="2"/>
          <LookupListIndex index="6" value="1"/>
          <LookupListIndex index="7" value="0"/>
        </Feature>
      </FeatureRecord>
    </FeatureList>

The Lookup for index "0" was for the decomposition rule, but its value
was "7". Failing to find documentation what this "value" means, I then
tried "ottest" under pango/opentype subdir, the reported feature list is:

<FeatureList>
   <FeatureCount>2</FeatureCount>
   <FeatureTag> RQD</FeatureTag> <!-- 0 -->
   <Feature> <!-- 0 -->
      <FeatureParams>0</FeatureParams>
      <LookupListCount>1</LookupListCount>
      <LookupIndex>8</LookupIndex>
   </Feature>
   <FeatureTag>ccmp</FeatureTag> <!-- 1 -->
   <Feature> <!-- 1 -->
      <FeatureParams>0</FeatureParams>
      <LookupListCount>8</LookupListCount>
      <LookupIndex>7</LookupIndex>
      <LookupIndex>6</LookupIndex>
      <LookupIndex>5</LookupIndex>
      <LookupIndex>4</LookupIndex>
      <LookupIndex>3</LookupIndex>
      <LookupIndex>2</LookupIndex>
      <LookupIndex>1</LookupIndex>
      <LookupIndex>0</LookupIndex>
   </Feature>
</FeatureList>

and the Lookup 0 was the decomposition rule. You may be right that the
order was wrong.

> > Actually, I haven't looked at the internal code of
> > pango_ot_ruleset_substitute() yet. My previous guess of its behavior
> > was empirical based on the fact that a quick hack by calling it twice
> > just made the rules completely applied. Just not sure if that's the
> > right thing.
> 
> That implies to me that your font or shaper is applying the rules
> in the wrong order.

You may be right.

> > > (*) As a general rule, in the OpenType language specs reordering tends
> > >     to be done by the shaper rather than the font, because OpenType
> > >     doesn't really handle reordering very nicely.
> > 
> > Hmm.. But the decompisition is in the font. This implies some intrusion
> > of the shaper in between the substitution process, or it should take
> > over the job completely as preprocessing. Hmm, The latter also sounds
> > good.
> 
> The way, say, multipart vowels work in Indic is that they are decomposed
> into the component pieces and those are reordered before the opentype
> processing is invoked. (Some post-processing after the shaping process
> but before positioning is needed as well in some cases.)
> 
> It helps that there actually are unicode representations for the pieces
> of the vowels.
> 
> If you are only reordering a small number of pieces, doing it in the
> font is probably fine, but a rule like:
> 
>  Reorder    <consonant> + <vowel left> + <vowel right> to 
>             <vowel left> + <consonant> + <vowel right>
> 
> is very hard to represent in OpenType when there are many possibilities
> for each of the categories. (The full Indic rules are actually much
> worse than this.)

Let me conclude this way:

The rules are actually of small number. So, it's possible to handle the
case in the font, given that the ordering is carefully controlled.

And the vowel components do have Unicode code points. So, it's also
possible to let the shaper do the reordering part as a post-processing
before applying GPOS, assuming the decomposition part in the font.

However, a recent test has made me pick the choice in which the shaper
totally takes over the job: the failed font appears to be rendered properly
by Uniscribe despite the wrong order. Now that I understand the function
of pango_ot_ruleset_substitute() more clearly, I think the case is better
handled as pre-processing instead, to make sure that font creators are not
confused by the platform difference.

My first proposed patch skipped the pre-processing too much. It ignored
both the PUA substitution and the clusterization (which also handled the
decomposition case) and relied solely on OT tables. Now the second patch
ignores the PUA part only.

Thanks for your explanation and guidelines.

Regards,
-Thep.
-- 
Theppitak Karoonboonyanan
http://linux.thai.net/~thep/
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]