The tool to config Shaping rule in Pango Engines



Hi Owen, Robert,

] 
] Chookij Vanatham <chookij vanatham eng sun com> writes:
] 
] > Hi Owen,
] > 
] > ] 
] > ] One possibility is to add a capability to the X/Pango interface
] > ] to find out the exact name of the font that is being used, and
] > ] then modify the shaping rules based on some list of foundry/family
] > ] names.
] 
] > Are we going to have this change in X/Pango interface before Gtk1.3.1
] > is going to replace Gtk1.2 in GNOME ?
] 
] Well, we'll have some solution. (GTK+-1.3 will not replace GTK+-1.2
] for GNOME until we release a final GTK+-2.0 and Pango-1.0)
] 
] I really dislike this solution because it means putting a whole lot
] of special cases into the Thai shaper. With this solution, we
] might make all sorts of mistakes way if we don't know about the
] fonts in advance, and there would be no way of fixing them without
] recompiling Pango. :-(
] 
] Regards,
]                                         Owen
] 
I don't like the idea of recompiling Pango in order to change shaping rules
either.

Let me share with you the idea of work-around for this issue.
My project leader, Ienup Sung, and I have been trying to work on this idea
but because of my capability limitation, then, I can have it, so far, only half
way through and we believe that this idea works. It might not be the best idea
but we would like to share our experience and hopefully, we can have something
working out for this issue in Pango.

The idea is to have the special engine which reads some kind of binary file
which contains the shaping rule information (including cell clustering)
and, of course, different rules for various font layouts refered by either
font-name, charset-name, etc.

The rule will be written in text file and we need to have the tool to
read it and compile it into binary file for this special engine.
This rule must support any scripts, ie, Thai, Arabic, Devanagari, .... etc.

In this special engine itself, it will have only one single algorithm to
read that binary data and will be able to transform/render from logical input
text stream, char *text, to PangoGlyphString. This single algorithm will
be like this which is the state diagram.

	do {

	STARTING_POINT:
	
	TypeID = GetCharacterType(CurrentCh);
	switch (State) {
	  case InitialState:
            PrevState = State;
            State = StateTransitionTable[PrevState][TypeID].state_id;
            break;

	  case FinalState:
            if (COLL_GLYPH_TBL(obj, PrevState, PrevTypeID))
		GlyphMapping(PrevState, PrevTypeID, ...);
	    State = InitialState;
            goto STARTING_POINT;

          default: /* Intermediate State */
            if (COLL_GLYPH_TBL(obj, PrevState, PrevTypeID))
		GlyphMapping(PrevState, PrevTypeID, ...);
            PrevState = State;
            State = StateTransitionTable[PrevState][TypeID].state_id;
	    break;
	}
        
	} while (End of Input);

GlyphMapping is the routine to generate PangoGlyphString for each cluster.
There are a couple main tables involved internally.

In our prototype (not in pango), I'm able to do this and only single special
engine with only single algorithm as shown above and able to render Thai and
Arabic correctly. I haven't been able to finish the tool yet but I did embeded
the internal tables in the engine instead. I did tried it with Devanagari
just for some samples of Ligature of Devanagari script which doesn't exist
in Thai script and, with the same single algorithm, it works as well.

Here is the rule.

         shaping_sequence    : initial_state '+' input '->' next_state_list
                             ;

         initial_state       : '()'
                             ;

         input               : HEXADECIMAL
                             ;

         next_state_list     : next_state
                             | next_state_list '+' input '->' next_state
                             ;

         next_state          : '(' out_buffer ',' in2out ',' out2in ','
                                     cluster ')'
                             ;

         out_buffer          : '[' out_char_list ']'
                             ;

         out_char_list       : HEXADECIMAL
                             | out_char_list ';' HEXADECIMAL
                             ;

         in2out              : '[' i2o_list ']'
                             ;

         i2o_list            : DECIMAL
                             | i2o_list ';' DECIMAL
                             ;

         out2in              : '[' o2i_list ']'
                             ;

         o2i_list            : DECIMAL
                             | o2i_list ';' DECIMAL
                             ;

         cluster	     : '[' cluster_list ']'
                             ;

         cluster_list        : HEXADECIMAL
                             | cluster_list ';' HEXADECIMAL
                             ;

     For example, the following shaping sequences can be defined:


        # A simple shaping sequence:
        
	( ) + 0x0E01 ->
		([0xA1],[0],[0],[1]) + 0x0E34 ->
		([0xA1;0xD4],[0;1],[0;1],[1;0]) + 0x0E48 ->
		([0xA1;0xD4;0xE8],[0;1;2],[0;1;2],[1;0;0;0])
	( ) + 0x0E01 ->
		([0xA1],[0],[0],[1]) + 0x0E35 ->
		([0xA1;0xD5],[0;1],[0;1],[1;0]) + 0x0E49 ->
		([0xA1;0xD5;0xE9],[0;1;2],[0;1;2],[1;0;0])


     The example shows a  shaping  sequence  such  that  if
     0xoE01,  0x0E34,  and  0x0E48  are the input buffer contents, it
     will be converted into an output buffer  containing  0xA1,
     0xD4, and 0xE8; an input to the output buffer containing
     0, 1, and 2; an output to the input buffer containing 0, 1, 2; and
     a property buffer containing 1, 0, and 0.

Let me know what you think. Hopefully, it would help to give some idea or
other ideas... Let me know your opinion and other ideas as well.

Chookij V.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]