Re: SaraAm

From: Chookij Vanatham <chookij vanatham eng sun com>
To: gtk-i18n-list gnome org, pablo mandrakesoft com, chookij vanatham eng sun com
Subject: Re: SaraAm
Date: Fri, 10 Nov 2000 18:24:24 -0800 (PST)
] 
] According to wtt cell-clustering rule, here is how to check if MaiTho
] is able to combine with Nikhahit.
] 
] Nikhahit is type AD1. MaiTho is type Tone.
] From wtt table, wtt_table[AD1][Tone] = 'R' which means MaiTho is not able
] to combine with Nikhahit. So, the display will be like this.
] 
]        Nikhahit    MaiTho
]        NoNu                 SaraAA
]        
] This should be read as "water" either.
Just typo, this should say "this shouldn't be read as "water"".

Chookij V.


] 
] 
] According to Trin's web page about "water", I would say the following.
] 
] If talking about hand-writing (not computer typing),
] 
]    NoNu + Nikhahit + MaiTho + SaraAA should be read as "water"
]    
] If talking about computer typing with wtt rule, the display isn't "water".
] 
] I guess that, K.Trin, might be able to tell us why what he says in his web
] is not the same as wtt rule is.
] 
] One thing, I like to point out that wtt defines Nikhahit as Diacritic.
] Nikhahit is not considered as vowel. I think, Nikhahit should problably come 
from,
] Sansakrit or Bali. In devanagari, it has "bindu" which is similar to Nikhahit
] and "bindu" isn't considered as vowel either. (not to sure if their purposes
] are going to be the same).
] 
] I guess, that's why, Nikhahit is not vowel and that's why MaiTho which is
] tonemark cann't be combined with Nikhahit. The same situation of another
] diacritic "Karan (U+0E4C)" with the general rule, I though.
] 
] - Tonemark must follow vowel
] - Diacritic must follow vowel
] 
] The word should have either Tonemark or Diacritic but shouldn't have both,
] I guess.
] 
] 
] 
] ] 
] ] > ] So my question is; should the input standardize SaraAm -> Nikhait + 
SaraAA
] ] > ] and SaraAm + tonemark --> Nikhait + tonemark + SaraAm ?
] ] > 
] ] > As for the input point of view, even, it seems to be that SaraAm can be
] ] > composed as Nikhahit and SaraAA but as the example above, seem to show 
that,
] ] > the word "water" would be only one sequence which uses SaraAm but it won't
] ] > be the sequence, neither Nonu + Nikhahit + MaiTho + SaraAA nor
] ] > Nonu + Maitho + Nikhahit + SaraAA.
] ] 
] ] Is that because it has been decided that way ?
] 
] Not 100% sure but, Thai sorting would be able to give some clue, if they are
] the same, both of them would be found at the same point.
] 
] ] Because http://www.inet.co.th/cyberclub/trin/thairef/ shows that both
] ] Nonu + Nikhahit + MaiTho + SaraAA and Nonu + MaiTho + SaraAm would
] ] provide the same visual output (the Nonu + Maitho + Nikhahit + SaraAA
] ] sequence will give a different visual output, thanks to the clustering
] ] rules).
] 
] Like I said above. K.Trin would be able to tell.
] 
] ] 
] ] > 	(1) DoDek (U+0E14) + SaraAm (U+0E33)
] ] > 	(2) DoDek (U+0E14) + Nikhahit (U+0E4D) + SaraAA (U+0E32)
] ] > 	
] ] > 	(1) and (2) are displayed the same but different sequences.
] ] > 	
] ] > 	For the input point of view,
] ] > 	
] ] > 	(A) Key DoDek (U+0E14),  Key SaraAm (U+0E33)
] ] > 	(B) Key DoDek (U+0E14),  Key Nikhahit (U+0E4D), Key SaraAA (U+0E32)
] ] > 	
] ] > 	(A) and (B) are considered as the same word "black".
] ] 
] ] But the problem is that (A) and (B) are not equal (from a computer point of
] ] view); that is a problem if you want to search the word "black" in a text
] ] file, for example.
] ] 
] ] The solution would be to have the input catch those cases and standardize;
] ] that is input (A) and (B) will produce a same byte string;
] 
] Not too sure if this standardize is all agreed in Thai industry.
] But, I guess, not 100% sure, for Thai sorting, it should catch these 2 cases
] and they should be considered the same word. Not too sure about other uses.
] 
] ] now, my question is: which one should it be ? (A) or (B) ?
] 
] To answer this question as a little Thai person for computer industry,
] I cann't. From the feeling so far, they are fine and should be the same for 
me.
] But this might be wrong.
] 
] ] 
] ] >     As I remember that I use to say before that, writting in Thai,
] ] >     vowel will always follow consonance and tonemark will always follow 
vowel.
] ] >     
] ] >     This is true for vowels like U+0E30 - U+0E39 (U+0E33 SaraAm might be
] ] >     the exception) and tonemark like U+0E48 - U+0E4B.
] ] >     
] ] >     But, there are other Thai vowels which are composed from these vowels.
] ] >     I would say that they would be called "compound vowel", like SaraAm,
] ] >     and a lot more, for example,
] ] >     
] ] >     	U+0E40 + (consonance) + U+0E32 ====> This is composing one 
] compound
] ] >     					     vowel.
] ] 
] ] But only the compund (Nikhait + SaraAA) is encoded (as SaraAm), all others
] ] are not, and must be typed by their decomposition, isn't it ?
] 
] I think, if asking about hand-writing, it will be needed to write by
] their decomposition. Don't know about computer typing and, for sure, they
] have both (1) Nikhahit + SaraAA and (2) SaraAM.
] 
] Just point out one more thing, see the following sequence...
] 
]       consonance + SaraI (U+0E34) + Nikhahit
]       
]       This "SaraI (U+0E34)" and Nikhahit might actually be the same meaning
]       of the vowel "SaraU (U+0E36)". I can't think of any words which requires
]       typing like this. But, it might be because all those words are always
]       typed by only SaraU (U+0E36).
]       
]       If this is true, we never type "SaraU (U+0E36)" with its decomposition.
]       If this is true, then, typing SaraAm with only decomposition might not
]       be true either.
]       
] Hope this helps.      
] 
] ] 
] ] Also, is U+0E40 a vowel that is written before the consoun it is attached
] ] to (even if it is pronounced after it) ?
] 
] Yes, it's always.
] 
] ] (that is a problem linked to
] ] tis-620 being a glyph-based encoding, that doesn't exist with the indic
] ] encodings (devanagari, etc))
] 
] In devanagari, there is the similar case for the vowel sign I (U+093F).
] It's pronounced after consonance but it's written before consonant.
] Are you agree with this ?
] 
] 
] Hope this helps,
] 
] Chookij V.
] 
] 
] ] 
] ] >SaraAm + Tonemark shouldn't be the same input as Nikhahit + Tonemarkd + 
SaraAA
] ] 
] ] That, we agree.
] ] 
] ] > Seems to be that, the way we type/write Thai words are not the same way as 
we
] ] > pronouce them. We write/type them in visual order with rule from Wtt but
] ] > we pronouce them like other Devanagari scripts....
] ] > Don't know why it's like that...
] ] 
] ] It is like that because it was easier like that: the only needed thing is
] ] a proportional font, no need to don any other modification to any program
] ] or OS (as long as they can display proportional fonts).
] ] That is because Thai script allows that by not having complex ligatured
] ] conjunts, but only "stacked" ones.
] ] The bad result is that typing may be odd at times, and ambiguities in 
encoding
] ] exist; but the good result is that use of Thai in general purpose computers
] ] has been possible since a lot of years; while other languages using indic
] ] scripts could only dream of it...
] ]  
] ] 
] ] -- 
] ] Ki ça vos våye bén,
] ] Pablo Saratxaga
] ] 
] ] http://www.srtxg.easynet.be/		PGP Key available, key ID: 
0x8F0E4975
] ] 
] ] _______________________________________________
] ] gtk-i18n-list mailing list
] ] gtk-i18n-list gnome org
] ] http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
] 
] 
] _______________________________________________
] gtk-i18n-list mailing list
] gtk-i18n-list gnome org
] http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]