Re: Industry Thai Cell-Clustering Rules

From: Pablo Saratxaga <pablo mandrakesoft com>
To: gtk-i18n-list gnome org
Subject: Re: Industry Thai Cell-Clustering Rules
Date: Fri, 3 Nov 2000 16:07:52 +0100

Kaixo!

On Thu, Nov 02, 2000 at 05:18:44PM -0800, Chookij Vanatham wrote:

> According to our experience, there are three different practices of Thai
> fonts for rendering :
> 
> 1. Plain tis620 : combining characters are placed at the safe positions to
>    prevent collapsion. There are two practices of this kind :
>    - negative-offset-zero-width diacritics (this makes the fonts apply to

> 2. MacThai extension : an extended tis620 code set, by using codes in the
>    range 0x80-0x9f and in some free slots to keep the prepositioned
>    combining characters. This needs a shaping algorithm to produce 
>    elegant rendering.

But does the glyphs at normal tis-620 position (that is, not precombined)
share the same properties as 1. above ? In other words, if we take an
extended thai font and use it as if was a simple tis-620 only one;
would it work in an acceptable fashion?

I was thinking until now that the diferences where in how the
negative-offset-zero-width diacritics were computed...

> 3. WindowsThai extension : similar to MacThai extension, but used in
>    Windows Thai Editions.
> 
> The last two code sets are mapped to their own private area of Unicode and
> cannot be used together.

Can't those be detected somehow ? (it would be interesting to have the
list of combinations and the codepoints assigned to the precombined glyphs)

> As far as I know, there is only 1 cell-clustering rule defined from Thai
> government (by NECTEC). This one is called Wtt2.0 and the detail is attached.
> We should add the word "wtt2.0" to any names if they are using Wtt2.0 cell
> clustering rule.

> ****
> If the cell-cluster is composed of "consonant", "vowel" and "tonemark",
> vowel character will always follow consonant and tonemark character
> will always follow vowel as shown below.
> 
> 	Consonant + Vowel + Tonemark  -----> One cell cluster
> 
> If tonemark comes before vowel, the vowel character will be considered as
> another cell-cluster as shown below.
> 
> 
> 	[Consonant + Tonemark] [Vowel] ----> Two cell clusters
> ******

But that is not font-specific, is it ?

> In my opinion, then, we might have these 2 types of cell-clustering rules
> and one has the name "wtt2.0", the other I'm not sure if we are going to
> name it or not.

But those two rules are a user definided preference, as how he prefers to see
wrongly typed thai strings, not some property of fonts; the two behaviours
should be possible with any kind of Thai font.
(or we could force the "wtt2.0" behaviour in all cases, as, IIC, it is a rule
that means "show wrong thai sequences in special way, so it is evident they
are wrong")

>  - How bad the problem is with legacy fonts without identified
>    clustering rules.
> 
>    We won't be able to have Thai display correctly after we do text
>     manipulation, like, insert, delete, copy-paste, selection, scrolling,

Because the copied string into the buffer uses the non tis-620 codepoint of
the precomposed glyphs or because of a cursor positionning problem?

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975

References:
- Industry Thai Cell-Clustering Rules
  - From: Chookij Vanatham

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]