Re: SaraAm
- From: Chookij Vanatham <chookij vanatham eng sun com>
- To: gtk-i18n-list gnome org, pablo mandrakesoft com, chookij vanatham eng sun com
- Subject: Re: SaraAm
- Date: Fri, 10 Nov 2000 18:24:24 -0800 (PST)
]
] According to wtt cell-clustering rule, here is how to check if MaiTho
] is able to combine with Nikhahit.
]
] Nikhahit is type AD1. MaiTho is type Tone.
] From wtt table, wtt_table[AD1][Tone] = 'R' which means MaiTho is not able
] to combine with Nikhahit. So, the display will be like this.
]
] Nikhahit MaiTho
] NoNu SaraAA
]
] This should be read as "water" either.
Just typo, this should say "this shouldn't be read as "water"".
Chookij V.
]
]
] According to Trin's web page about "water", I would say the following.
]
] If talking about hand-writing (not computer typing),
]
] NoNu + Nikhahit + MaiTho + SaraAA should be read as "water"
]
] If talking about computer typing with wtt rule, the display isn't "water".
]
] I guess that, K.Trin, might be able to tell us why what he says in his web
] is not the same as wtt rule is.
]
] One thing, I like to point out that wtt defines Nikhahit as Diacritic.
] Nikhahit is not considered as vowel. I think, Nikhahit should problably come
from,
] Sansakrit or Bali. In devanagari, it has "bindu" which is similar to Nikhahit
] and "bindu" isn't considered as vowel either. (not to sure if their purposes
] are going to be the same).
]
] I guess, that's why, Nikhahit is not vowel and that's why MaiTho which is
] tonemark cann't be combined with Nikhahit. The same situation of another
] diacritic "Karan (U+0E4C)" with the general rule, I though.
]
] - Tonemark must follow vowel
] - Diacritic must follow vowel
]
] The word should have either Tonemark or Diacritic but shouldn't have both,
] I guess.
]
]
]
] ]
] ] > ] So my question is; should the input standardize SaraAm -> Nikhait +
SaraAA
] ] > ] and SaraAm + tonemark --> Nikhait + tonemark + SaraAm ?
] ] >
] ] > As for the input point of view, even, it seems to be that SaraAm can be
] ] > composed as Nikhahit and SaraAA but as the example above, seem to show
that,
] ] > the word "water" would be only one sequence which uses SaraAm but it won't
] ] > be the sequence, neither Nonu + Nikhahit + MaiTho + SaraAA nor
] ] > Nonu + Maitho + Nikhahit + SaraAA.
] ]
] ] Is that because it has been decided that way ?
]
] Not 100% sure but, Thai sorting would be able to give some clue, if they are
] the same, both of them would be found at the same point.
]
] ] Because http://www.inet.co.th/cyberclub/trin/thairef/ shows that both
] ] Nonu + Nikhahit + MaiTho + SaraAA and Nonu + MaiTho + SaraAm would
] ] provide the same visual output (the Nonu + Maitho + Nikhahit + SaraAA
] ] sequence will give a different visual output, thanks to the clustering
] ] rules).
]
] Like I said above. K.Trin would be able to tell.
]
] ]
] ] > (1) DoDek (U+0E14) + SaraAm (U+0E33)
] ] > (2) DoDek (U+0E14) + Nikhahit (U+0E4D) + SaraAA (U+0E32)
] ] >
] ] > (1) and (2) are displayed the same but different sequences.
] ] >
] ] > For the input point of view,
] ] >
] ] > (A) Key DoDek (U+0E14), Key SaraAm (U+0E33)
] ] > (B) Key DoDek (U+0E14), Key Nikhahit (U+0E4D), Key SaraAA (U+0E32)
] ] >
] ] > (A) and (B) are considered as the same word "black".
] ]
] ] But the problem is that (A) and (B) are not equal (from a computer point of
] ] view); that is a problem if you want to search the word "black" in a text
] ] file, for example.
] ]
] ] The solution would be to have the input catch those cases and standardize;
] ] that is input (A) and (B) will produce a same byte string;
]
] Not too sure if this standardize is all agreed in Thai industry.
] But, I guess, not 100% sure, for Thai sorting, it should catch these 2 cases
] and they should be considered the same word. Not too sure about other uses.
]
] ] now, my question is: which one should it be ? (A) or (B) ?
]
] To answer this question as a little Thai person for computer industry,
] I cann't. From the feeling so far, they are fine and should be the same for
me.
] But this might be wrong.
]
] ]
] ] > As I remember that I use to say before that, writting in Thai,
] ] > vowel will always follow consonance and tonemark will always follow
vowel.
] ] >
] ] > This is true for vowels like U+0E30 - U+0E39 (U+0E33 SaraAm might be
] ] > the exception) and tonemark like U+0E48 - U+0E4B.
] ] >
] ] > But, there are other Thai vowels which are composed from these vowels.
] ] > I would say that they would be called "compound vowel", like SaraAm,
] ] > and a lot more, for example,
] ] >
] ] > U+0E40 + (consonance) + U+0E32 ====> This is composing one
] compound
] ] > vowel.
] ]
] ] But only the compund (Nikhait + SaraAA) is encoded (as SaraAm), all others
] ] are not, and must be typed by their decomposition, isn't it ?
]
] I think, if asking about hand-writing, it will be needed to write by
] their decomposition. Don't know about computer typing and, for sure, they
] have both (1) Nikhahit + SaraAA and (2) SaraAM.
]
] Just point out one more thing, see the following sequence...
]
] consonance + SaraI (U+0E34) + Nikhahit
]
] This "SaraI (U+0E34)" and Nikhahit might actually be the same meaning
] of the vowel "SaraU (U+0E36)". I can't think of any words which requires
] typing like this. But, it might be because all those words are always
] typed by only SaraU (U+0E36).
]
] If this is true, we never type "SaraU (U+0E36)" with its decomposition.
] If this is true, then, typing SaraAm with only decomposition might not
] be true either.
]
] Hope this helps.
]
] ]
] ] Also, is U+0E40 a vowel that is written before the consoun it is attached
] ] to (even if it is pronounced after it) ?
]
] Yes, it's always.
]
] ] (that is a problem linked to
] ] tis-620 being a glyph-based encoding, that doesn't exist with the indic
] ] encodings (devanagari, etc))
]
] In devanagari, there is the similar case for the vowel sign I (U+093F).
] It's pronounced after consonance but it's written before consonant.
] Are you agree with this ?
]
]
] Hope this helps,
]
] Chookij V.
]
]
] ]
] ] >SaraAm + Tonemark shouldn't be the same input as Nikhahit + Tonemarkd +
SaraAA
] ]
] ] That, we agree.
] ]
] ] > Seems to be that, the way we type/write Thai words are not the same way as
we
] ] > pronouce them. We write/type them in visual order with rule from Wtt but
] ] > we pronouce them like other Devanagari scripts....
] ] > Don't know why it's like that...
] ]
] ] It is like that because it was easier like that: the only needed thing is
] ] a proportional font, no need to don any other modification to any program
] ] or OS (as long as they can display proportional fonts).
] ] That is because Thai script allows that by not having complex ligatured
] ] conjunts, but only "stacked" ones.
] ] The bad result is that typing may be odd at times, and ambiguities in
encoding
] ] exist; but the good result is that use of Thai in general purpose computers
] ] has been possible since a lot of years; while other languages using indic
] ] scripts could only dream of it...
] ]
] ]
] ] --
] ] Ki ça vos våye bén,
] ] Pablo Saratxaga
] ]
] ] http://www.srtxg.easynet.be/ PGP Key available, key ID:
0x8F0E4975
] ]
] ] _______________________________________________
] ] gtk-i18n-list mailing list
] ] gtk-i18n-list gnome org
] ] http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
]
]
] _______________________________________________
] gtk-i18n-list mailing list
] gtk-i18n-list gnome org
] http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]