Re: Industry Thai Cell-Clustering Rules

From: Chookij Vanatham <chookij vanatham eng sun com>
To: gtk-i18n-list gnome org, pablo mandrakesoft com
Subject: Re: Industry Thai Cell-Clustering Rules
Date: Fri, 03 Nov 2000 10:30:16 -0800 (PST)
Hi Pablo,

I'm glad that you join our discussion too.
Here is my answer.

] 
] Kaixo!
] 
] On Thu, Nov 02, 2000 at 05:18:44PM -0800, Chookij Vanatham wrote:
] 
] > According to our experience, there are three different practices of Thai
] > fonts for rendering :
] > 
] > 1. Plain tis620 : combining characters are placed at the safe positions to
] >    prevent collapsion. There are two practices of this kind :
] >    - negative-offset-zero-width diacritics (this makes the fonts apply to
] 
] > 2. MacThai extension : an extended tis620 code set, by using codes in the
] >    range 0x80-0x9f and in some free slots to keep the prepositioned
] >    combining characters. This needs a shaping algorithm to produce 
] >    elegant rendering.
] 
] But does the glyphs at normal tis-620 position (that is, not precombined)
] share the same properties as 1. above ?

Yes, they do. Another word is shown below.

		MacThai = Plain tis620 + some adjusted glyphs
		WindowsThai = Plain tis620 + some adjusted glyphs

] In other words, if we take an extended thai font and use it as if was a
] simple tis-620 only one; would it work in an acceptable fashion?

It will work but whether it's acceptable fashion or not, here is just
my opinion. Since Mac Window and Microsoft Window support Thai, they
use those "some adjusted glyphs" to adjust Thai vowel and tonemark in
each cell to look better in term of how close they are with based consonant.
Particurlary, on the desktop application, it's the must that they need
to support. They have been like that for many many years.
Then, I would say, if just only Plain tis620 wouldn't be enough for
acceptable fashion but still, Thai users are still able to read them.
No problem.


] 
] I was thinking until now that the diferences where in how the
] negative-offset-zero-width diacritics were computed...
] 
] > 3. WindowsThai extension : similar to MacThai extension, but used in
] >    Windows Thai Editions.
] > 
] > The last two code sets are mapped to their own private area of Unicode and
] > cannot be used together.
] 
] Can't those be detected somehow ? (it would be interesting to have the
] list of combinations and the codepoints assigned to the precombined glyphs)

Not quite sure about the question, give me more detail.

] 
] > As far as I know, there is only 1 cell-clustering rule defined from Thai
] > government (by NECTEC). This one is called Wtt2.0 and the detail is 
attached.
] > We should add the word "wtt2.0" to any names if they are using Wtt2.0 cell
] > clustering rule.
] 
] > ****
] > If the cell-cluster is composed of "consonant", "vowel" and "tonemark",
] > vowel character will always follow consonant and tonemark character
] > will always follow vowel as shown below.
] > 
] > 	Consonant + Vowel + Tonemark  -----> One cell cluster
] > 
] > If tonemark comes before vowel, the vowel character will be considered as
] > another cell-cluster as shown below.
] > 
] > 
] > 	[Consonant + Tonemark] [Vowel] ----> Two cell clusters
] > ******
] 
] But that is not font-specific, is it ?

This is the point. Let me show you then, you might help to answer me if
this is font-specific or not.

Here is the piece of Thai pango engine codes which are for determining,
the Thai cell cluster.

    while (p < text + length)
     {
        ...

	if (wc >= 0xe30 && wc < 0xe50)
	   group = groups[wc - 0xe30];
	else
           group = 0;

	switch (group)
	{
        case 0:
	  if (base)
            {
              add_cluster (font_info, glyphs, cluster_start, base,
              			group1, group2);
		group1 = 0;
		group2 = 0;
            }
          cluster_start = p - text;
	  base = wc;
	  break;
	case 1:
	  group1 = wc;
	  break;
	case 2:
	  group2 = wc;
	  break;
	}

      p = g_utf8_next_char (p);
 
      ....

    }


This piece of code determines Cell-Clustering for Thai and, of couse,
it doesn't use Wtt2.0 Cell-clustering logic.

Now, if users want to have Thai displayed as Wtt2.0 Cell-clustering,
what will you do ?

That's why, we put Cell-clustering to XLFD name, so that, the engine
can determine which cell-clustering rule should be used. Then, users
would be able to choose what they want.

It's not really clear cut to say that Cell-clustering is not specific to
font. Unfortunately, in the industry, there are more than 1 cell-clustering
rule.



] 
] > In my opinion, then, we might have these 2 types of cell-clustering rules
] > and one has the name "wtt2.0", the other I'm not sure if we are going to
] > name it or not.
] 
] But those two rules are a user definided preference, as how he prefers to see
] wrongly typed thai strings, not some property of fonts; the two behaviours
] should be possible with any kind of Thai font.
] (or we could force the "wtt2.0" behaviour in all cases, as, IIC, it is a rule
] that means "show wrong thai sequences in special way, so it is evident they
] are wrong")

That's right. We should let users to choose which one they want.
Wheter it's specific to font or not, my comment is above.



] 
] >  - How bad the problem is with legacy fonts without identified
] >    clustering rules.
] > 
] >    We won't be able to have Thai display correctly after we do text
] >     manipulation, like, insert, delete, copy-paste, selection, scrolling,
] 
] Because the copied string into the buffer uses the non tis-620 codepoint of
] the precomposed glyphs or because of a cursor positionning problem?

Let's say if those software don't concern about cell-clustering but use
Thai font whose tonemark/vowel are zero-width space. There are a lot of
incorrect behavior that are not able to be accepted for sure.

One easy sample that would be understandable to non-Thai users is about
Cursor movement.

Let's say, you have the following English text.

			ABC
Cursor (or I-Beam) will be able to move to the left/right whenever you type
arrow-key left/right one by one. Right ?


Now, how about Thai text but the software doesn't care about Cell-clustering.

Let's say A - consonant, B - vowel, C - tonemark.

			C
			B
			A
			
B and C are combined and displayed on top of A (because zero-width).
Think about using the same logic to move I-Beam, you need to type arrow-key
left or right 3 times to move I-beam to the left or the right.
Doesn't it look weird ? Why do you need to type arrow-key 3 times ?

The problem will happend among insert/delete/copy-paste/selection/scrolling
if the software don't consider Cell-clustering.

Hope this would help.


Chookij V.
			

] 
] -- 
] Ki ça vos våye bén,
] Pablo Saratxaga
] 
] http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975
] 
] _______________________________________________
] gtk-i18n-list mailing list
] gtk-i18n-list gnome org
] http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
Follow-Ups:
- Re: Industry Thai Cell-Clustering Rules
  - From: Robert Brady
- Re: Industry Thai Cell-Clustering Rules
  - From: Pablo Saratxaga
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]