Re: SaraAm

From: Pablo Saratxaga <pablo mandrakesoft com>
To: gtk-i18n-list gnome org
Subject: Re: SaraAm
Date: Sat, 11 Nov 2000 02:04:09 +0100
Kaixo!

On Fri, Nov 10, 2000 at 02:32:28PM -0800, Chookij Vanatham wrote:

> ] How is the word "water" supposed to be written ?
> 
> "water" will be typed/input by the following keys.
> 
> Key "NoNu" (U+0E19) + Key "MaiTho" (U+0E49) + Key "SaraAm" (U+0E33)
> 
> If we type "Key Nikhahit (U+0E4D)" and "Key SaraAA (U+0E32)" in stead of
> one "Key SaraAm (U+0E33)", the display of word "water" will be like this.
> 
> 	MaiTho (U+0E4D)     Nikhahit (U+0E4D)
> 	NoNu (U+0E19)                           SaraAA (U+0E32)
> 	
> This won't be read as "water" though. It's not the right word.

Yes; but if it is typed as:

"NoNu" (U+0E19) + Nikhahit (U+0E4D) + "MaiTho" (U+0E49) + SaraAA (U+0E32) ?

> ] So my question is; should the input standardize SaraAm -> Nikhait + SaraAA
> ] and SaraAm + tonemark --> Nikhait + tonemark + SaraAm ?
> 
> As for the input point of view, even, it seems to be that SaraAm can be
> composed as Nikhahit and SaraAA but as the example above, seem to show that,
> the word "water" would be only one sequence which uses SaraAm but it won't
> be the sequence, neither Nonu + Nikhahit + MaiTho + SaraAA nor
> Nonu + Maitho + Nikhahit + SaraAA.

Is that because it has been decided that way ?
Because http://www.inet.co.th/cyberclub/trin/thairef/ shows that both
Nonu + Nikhahit + MaiTho + SaraAA and Nonu + MaiTho + SaraAm would
provide the same visual output (the Nonu + Maitho + Nikhahit + SaraAA
sequence will give a different visual output, thanks to the clustering
rules).

> 	(1) DoDek (U+0E14) + SaraAm (U+0E33)
> 	(2) DoDek (U+0E14) + Nikhahit (U+0E4D) + SaraAA (U+0E32)
> 	
> 	(1) and (2) are displayed the same but different sequences.
> 	
> 	For the input point of view,
> 	
> 	(A) Key DoDek (U+0E14),  Key SaraAm (U+0E33)
> 	(B) Key DoDek (U+0E14),  Key Nikhahit (U+0E4D), Key SaraAA (U+0E32)
> 	
> 	(A) and (B) are considered as the same word "black".

But the problem is that (A) and (B) are not equal (from a computer point of
view); that is a problem if you want to search the word "black" in a text
file, for example.

The solution would be to have the input catch those cases and standardize;
that is input (A) and (B) will produce a same byte string; now, my question
is: which one should it be ? (A) or (B) ?

>     As I remember that I use to say before that, writting in Thai,
>     vowel will always follow consonance and tonemark will always follow vowel.
>     
>     This is true for vowels like U+0E30 - U+0E39 (U+0E33 SaraAm might be
>     the exception) and tonemark like U+0E48 - U+0E4B.
>     
>     But, there are other Thai vowels which are composed from these vowels.
>     I would say that they would be called "compound vowel", like SaraAm,
>     and a lot more, for example,
>     
>     	U+0E40 + (consonance) + U+0E32 ====> This is composing one compound
>     					     vowel.

But only the compund (Nikhait + SaraAA) is encoded (as SaraAm), all others
are not, and must be typed by their decomposition, isn't it ?

Also, is U+0E40 a vowel that is written before the consoun it is attached
to (even if it is pronounced after it) ? (that is a problem linked to
tis-620 being a glyph-based encoding, that doesn't exist with the indic
encodings (devanagari, etc))

>SaraAm + Tonemark shouldn't be the same input as Nikhahit + Tonemarkd + SaraAA

That, we agree.

> Seems to be that, the way we type/write Thai words are not the same way as we
> pronouce them. We write/type them in visual order with rule from Wtt but
> we pronouce them like other Devanagari scripts....
> Don't know why it's like that...

It is like that because it was easier like that: the only needed thing is
a proportional font, no need to don any other modification to any program
or OS (as long as they can display proportional fonts).
That is because Thai script allows that by not having complex ligatured
conjunts, but only "stacked" ones.
The bad result is that typing may be odd at times, and ambiguities in encoding
exist; but the good result is that use of Thai in general purpose computers
has been possible since a lot of years; while other languages using indic
scripts could only dream of it...
 

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975
References:
- SaraAm
  - From: Chookij Vanatham
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]