Industry Thai Cell-Clustering Rules

From: Chookij Vanatham <chookij vanatham eng sun com>
To: thep links nectec or th, gtk-i18n-list gnome org
Subject: Industry Thai Cell-Clustering Rules
Date: Thu, 02 Nov 2000 17:19:05 -0800 (PST)
K.Theppitak and other Thai folks,

Let's try to finalize various Thai Cell-Clustering Rules in the industry
so that we can keep them as standard and having them defined in the
encoding filed of the XLFD.

Let's start with the following info from K.Theppitak.

According to our experience, there are three different practices of Thai
fonts for rendering :

1. Plain tis620 : combining characters are placed at the safe positions to
   prevent collapsion. There are two practices of this kind :
   - negative-offset-zero-width diacritics (this makes the fonts apply to
     many applications, such as Netscape, which support Western fonts
     without knowing they are rendering Thai fonts)
   - real monospace fonts (used in mule/emacs; this requires the
     applications to combine characters into cells)

2. MacThai extension : an extended tis620 code set, by using codes in the
   range 0x80-0x9f and in some free slots to keep the prepositioned
   combining characters. This needs a shaping algorithm to produce 
   elegant rendering.

3. WindowsThai extension : similar to MacThai extension, but used in
   Windows Thai Editions.

The last two code sets are mapped to their own private area of Unicode and
cannot be used together.

So, we are now discussing about a convention on the encoding field on XLFD
to distinguish the three code sets :-

  -tis620-0   for plain tis620
  -tis620-1   for MacThai extension
  -tis620-2   for WindowsThai extension

Note that the years in the registry field are omitted, because tis620.2529
and tis620.2533 do not differ in content. Both can be referred to as
tis620 without confusion.

Now, we need to talk about Cell-clustering rules which are going to be used
with fonts.

As far as I know, there is only 1 cell-clustering rule defined from Thai
government (by NECTEC). This one is called Wtt2.0 and the detail is attached.
We should add the word "wtt2.0" to any names if they are using Wtt2.0 cell
clustering rule.

Ex:

  -tis620-0.wtt2.0 for plain tis620 with Wtt2.0 cell clustering rule
  -tis620-1.wtt2.0 for MacThai extension with Wtt2.0 cell clustering rule
  -tis620-2.wtt2.0 for WindowsThai extension with Wtt2.0 cell clustering rule

Let me point out the important piece of Wtt2.0 Cell-Clustering rule in order
to compare other Cell-Clustering rules more easily (but, please refer to
the detail when doing the implementation).

****
If the cell-cluster is composed of "consonant", "vowel" and "tonemark",
vowel character will always follow consonant and tonemark character
will always follow vowel as shown below.

	Consonant + Vowel + Tonemark  -----> One cell cluster

If tonemark comes before vowel, the vowel character will be considered as
another cell-cluster as shown below.


	[Consonant + Tonemark] [Vowel] ----> Two cell clusters
******

Other Thai Cell-Cluster rules are done by various companies. I don't know
exactly how many they are going to be. Let's focus on those popular ones
whether they are needed to be defined as the extra names. 

(1) Thai Microsoft Window
(2) Thai Macintosh Window

I think both of them will follow the simple rule as below.

	List of cell-cluster
	- consonant
	- consonant + vowel
	- consonant + tonemark
	- consonant + vowel + tonemark <---- ****
	- consonant + tonemark + vowel <---- ****

As you can see the one with the mark **** are considered as one cell clustering
even the sequence of vowel and tonemark are different. This is the different
point when comparing to Wtt2.0 cell clustering.

In my opinion, then, we might have these 2 types of cell-clustering rules
and one has the name "wtt2.0", the other I'm not sure if we are going to
name it or not.

>From this point if we can finalize these, then, how idividual cell cluster
is going to be shaped, that should depend on whichever fonts (plain, Mac,
Microsoft) are using.

Unfortunately, neither of them have said clearly about SaraAm case.
>From both Thai Mac/Microsoft windows, the following is the clustering case
for SaraAm.

	Consonant + SaraAm            ----> 1 cell clustering
	Consonant + Tonemark + SaraAm ----> 1 cell clustering

Again, from my opinion, then, the following list should have done cell
clustering as shown above for SaraAm case.

	-tis620-1
	-tis620-2
	-tis620-2.wtt2.0

K.Theppitak and other Thai folks, please let me know about any opinion and see
if we can come-up with the agreement.


To Owen,

Addtionally, here is my answere to Owen's questions.

 - How many different rules are in use
  
   I would say, 2 cell-clustering rules. One is called Wtt2.0. The other,
   as explained above.
 
 - Which ones we need to support
 
   Just my opinion, how about these.
   
   	-tis620-0
   	-tis620-1
   	-tis620-2
   	-tis620-2.wtt2.0  <--- I would help on this.
 
 - How bad the problem is with legacy fonts without identified
   clustering rules.

   We won't be able to have Thai display correctly after we do text manipulation,
   like, insert, delete, copy-paste, selection, scrolling, .....
   To me, I won't trust to use those software because I'm not too sure if,
   whatever I edit Thai text files, the result will be correct as we think
   and as it shows.
   

Chookij V.
Follow-Ups:
- Re: Industry Thai Cell-Clustering Rules
  - From: Theppitak Karoonboonyanan
- Re: Industry Thai Cell-Clustering Rules
  - From: Pablo Saratxaga
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]