Re: SaraAm



] 
] Kaixo!
] 
] On Fri, Nov 10, 2000 at 02:32:28PM -0800, Chookij Vanatham wrote:
] 
] > ] How is the word "water" supposed to be written ?
] > 
] > "water" will be typed/input by the following keys.
] > 
] > Key "NoNu" (U+0E19) + Key "MaiTho" (U+0E49) + Key "SaraAm" (U+0E33)
] > 
] > If we type "Key Nikhahit (U+0E4D)" and "Key SaraAA (U+0E32)" in stead of
] > one "Key SaraAm (U+0E33)", the display of word "water" will be like this.
] > 
] > 	MaiTho (U+0E4D)     Nikhahit (U+0E4D)
] > 	NoNu (U+0E19)                           SaraAA (U+0E32)
] > 	
] > This won't be read as "water" though. It's not the right word.
] 
] Yes; but if it is typed as:
] 
] "NoNu" (U+0E19) + Nikhahit (U+0E4D) + "MaiTho" (U+0E49) + SaraAA (U+0E32) ?

According to wtt cell-clustering rule, here is how to check if MaiTho
is able to combine with Nikhahit.

Nikhahit is type AD1. MaiTho is type Tone.
>From wtt table, wtt_table[AD1][Tone] = 'R' which means MaiTho is not able
to combine with Nikhahit. So, the display will be like this.

       Nikhahit    MaiTho
       NoNu                 SaraAA
       
This should be read as "water" either.


According to Trin's web page about "water", I would say the following.

If talking about hand-writing (not computer typing),

   NoNu + Nikhahit + MaiTho + SaraAA should be read as "water"
   
If talking about computer typing with wtt rule, the display isn't "water".

I guess that, K.Trin, might be able to tell us why what he says in his web
is not the same as wtt rule is.

One thing, I like to point out that wtt defines Nikhahit as Diacritic.
Nikhahit is not considered as vowel. I think, Nikhahit should problably come from,
Sansakrit or Bali. In devanagari, it has "bindu" which is similar to Nikhahit
and "bindu" isn't considered as vowel either. (not to sure if their purposes
are going to be the same).

I guess, that's why, Nikhahit is not vowel and that's why MaiTho which is
tonemark cann't be combined with Nikhahit. The same situation of another
diacritic "Karan (U+0E4C)" with the general rule, I though.

- Tonemark must follow vowel
- Diacritic must follow vowel

The word should have either Tonemark or Diacritic but shouldn't have both,
I guess.



] 
] > ] So my question is; should the input standardize SaraAm -> Nikhait + SaraAA
] > ] and SaraAm + tonemark --> Nikhait + tonemark + SaraAm ?
] > 
] > As for the input point of view, even, it seems to be that SaraAm can be
] > composed as Nikhahit and SaraAA but as the example above, seem to show that,
] > the word "water" would be only one sequence which uses SaraAm but it won't
] > be the sequence, neither Nonu + Nikhahit + MaiTho + SaraAA nor
] > Nonu + Maitho + Nikhahit + SaraAA.
] 
] Is that because it has been decided that way ?

Not 100% sure but, Thai sorting would be able to give some clue, if they are
the same, both of them would be found at the same point.

] Because http://www.inet.co.th/cyberclub/trin/thairef/ shows that both
] Nonu + Nikhahit + MaiTho + SaraAA and Nonu + MaiTho + SaraAm would
] provide the same visual output (the Nonu + Maitho + Nikhahit + SaraAA
] sequence will give a different visual output, thanks to the clustering
] rules).

Like I said above. K.Trin would be able to tell.

] 
] > 	(1) DoDek (U+0E14) + SaraAm (U+0E33)
] > 	(2) DoDek (U+0E14) + Nikhahit (U+0E4D) + SaraAA (U+0E32)
] > 	
] > 	(1) and (2) are displayed the same but different sequences.
] > 	
] > 	For the input point of view,
] > 	
] > 	(A) Key DoDek (U+0E14),  Key SaraAm (U+0E33)
] > 	(B) Key DoDek (U+0E14),  Key Nikhahit (U+0E4D), Key SaraAA (U+0E32)
] > 	
] > 	(A) and (B) are considered as the same word "black".
] 
] But the problem is that (A) and (B) are not equal (from a computer point of
] view); that is a problem if you want to search the word "black" in a text
] file, for example.
] 
] The solution would be to have the input catch those cases and standardize;
] that is input (A) and (B) will produce a same byte string;

Not too sure if this standardize is all agreed in Thai industry.
But, I guess, not 100% sure, for Thai sorting, it should catch these 2 cases
and they should be considered the same word. Not too sure about other uses.

] now, my question is: which one should it be ? (A) or (B) ?

To answer this question as a little Thai person for computer industry,
I cann't. From the feeling so far, they are fine and should be the same for me.
But this might be wrong.

] 
] >     As I remember that I use to say before that, writting in Thai,
] >     vowel will always follow consonance and tonemark will always follow vowel.
] >     
] >     This is true for vowels like U+0E30 - U+0E39 (U+0E33 SaraAm might be
] >     the exception) and tonemark like U+0E48 - U+0E4B.
] >     
] >     But, there are other Thai vowels which are composed from these vowels.
] >     I would say that they would be called "compound vowel", like SaraAm,
] >     and a lot more, for example,
] >     
] >     	U+0E40 + (consonance) + U+0E32 ====> This is composing one 
compound
] >     					     vowel.
] 
] But only the compund (Nikhait + SaraAA) is encoded (as SaraAm), all others
] are not, and must be typed by their decomposition, isn't it ?

I think, if asking about hand-writing, it will be needed to write by
their decomposition. Don't know about computer typing and, for sure, they
have both (1) Nikhahit + SaraAA and (2) SaraAM.

Just point out one more thing, see the following sequence...

      consonance + SaraI (U+0E34) + Nikhahit
      
      This "SaraI (U+0E34)" and Nikhahit might actually be the same meaning
      of the vowel "SaraU (U+0E36)". I can't think of any words which requires
      typing like this. But, it might be because all those words are always
      typed by only SaraU (U+0E36).
      
      If this is true, we never type "SaraU (U+0E36)" with its decomposition.
      If this is true, then, typing SaraAm with only decomposition might not
      be true either.
      
Hope this helps.      

] 
] Also, is U+0E40 a vowel that is written before the consoun it is attached
] to (even if it is pronounced after it) ?

Yes, it's always.

] (that is a problem linked to
] tis-620 being a glyph-based encoding, that doesn't exist with the indic
] encodings (devanagari, etc))

In devanagari, there is the similar case for the vowel sign I (U+093F).
It's pronounced after consonance but it's written before consonant.
Are you agree with this ?


Hope this helps,

Chookij V.


] 
] >SaraAm + Tonemark shouldn't be the same input as Nikhahit + Tonemarkd + SaraAA
] 
] That, we agree.
] 
] > Seems to be that, the way we type/write Thai words are not the same way as we
] > pronouce them. We write/type them in visual order with rule from Wtt but
] > we pronouce them like other Devanagari scripts....
] > Don't know why it's like that...
] 
] It is like that because it was easier like that: the only needed thing is
] a proportional font, no need to don any other modification to any program
] or OS (as long as they can display proportional fonts).
] That is because Thai script allows that by not having complex ligatured
] conjunts, but only "stacked" ones.
] The bad result is that typing may be odd at times, and ambiguities in encoding
] exist; but the good result is that use of Thai in general purpose computers
] has been possible since a lot of years; while other languages using indic
] scripts could only dream of it...
]  
] 
] -- 
] Ki ça vos våye bén,
] Pablo Saratxaga
] 
] http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975
] 
] _______________________________________________
] gtk-i18n-list mailing list
] gtk-i18n-list gnome org
] http://mail.gnome.org/mailman/listinfo/gtk-i18n-list





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]