Re: SaraAm

From: Theppitak Karoonboonayanan <thep links nectec or th>
To: gtk-i18n-list gnome org
Subject: Re: SaraAm
Date: Mon, 13 Nov 2000 22:03:51 +0700
On Sat, 11 Nov 2000 02:04:09 +0100, Pablo Saratxaga wrote:
> 
> On Fri, Nov 10, 2000 at 02:32:28PM -0800, Chookij Vanatham wrote:
> 
> > ] How is the word "water" supposed to be written ?
> > 
> > "water" will be typed/input by the following keys.
> > 
> > Key "NoNu" (U+0E19) + Key "MaiTho" (U+0E49) + Key "SaraAm" (U+0E33)
> > 
> > If we type "Key Nikhahit (U+0E4D)" and "Key SaraAA (U+0E32)" in stead of
> > one "Key SaraAm (U+0E33)", the display of word "water" will be like this.
> > 
> >       MaiTho (U+0E4D)     Nikhahit (U+0E4D)
> >       NoNu (U+0E19)                           SaraAA (U+0E32)
> >       
> > This won't be read as "water" though. It's not the right word.
> 
> Yes; but if it is typed as:
> 
> "NoNu" (U+0E19) + Nikhahit (U+0E4D) + "MaiTho" (U+0E49) + SaraAA (U+0E32) ?

This is a meaningless string for Thai text processing.
Wtt recommends that SaraAm always be used, so that there is no ambiguity
in this case.

> >       (1) DoDek (U+0E14) + SaraAm (U+0E33)
> >       (2) DoDek (U+0E14) + Nikhahit (U+0E4D) + SaraAA (U+0E32)
> >       
> >       (1) and (2) are displayed the same but different sequences.
> >       
> >       For the input point of view,
> >       
> >       (A) Key DoDek (U+0E14),  Key SaraAm (U+0E33)
> >       (B) Key DoDek (U+0E14),  Key Nikhahit (U+0E4D), Key SaraAA (U+0E32)
> >       
> >       (A) and (B) are considered as the same word "black".
> 
> But the problem is that (A) and (B) are not equal (from a computer point of
> view); that is a problem if you want to search the word "black" in a text
> file, for example.
> 
> The solution would be to have the input catch those cases and standardize;
> that is input (A) and (B) will produce a same byte string; now, my question
> is: which one should it be ? (A) or (B) ?

(A) is always encoded, not (B).

> >     As I remember that I use to say before that, writting in Thai,
> >     vowel will always follow consonance and tonemark will always follow vowel.
> >     
> >     This is true for vowels like U+0E30 - U+0E39 (U+0E33 SaraAm might be
> >     the exception) and tonemark like U+0E48 - U+0E4B.
> >     
> >     But, there are other Thai vowels which are composed from these vowels.
> >     I would say that they would be called "compound vowel", like SaraAm,
> >     and a lot more, for example,
> >     
> >       U+0E40 + (consonance) + U+0E32 ====> This is composing one compound
> >                                            vowel.
> 
> But only the compund (Nikhait + SaraAA) is encoded (as SaraAm), all others
> are not, and must be typed by their decomposition, isn't it ?

If you say so, Sara II (U+0E35), Sara UE (U+0E36), Sara UEE (U+0E37) are
also compound vowels, according to ancient Thai grammar.

In fact, the little circle of Sara UE (U+0E36) and Sara Am (U+0E33) are
both Nikhahit (U+0E4D). Nikhahit is one of the symbols for composing new
vowels in Thai writing system. But this functions of it has faded away from
people's common sense by now, and we have composite vowels (but primitive
in people's common sense) instead in modern Thai printing.

The remaining use of Nikhahit, however, is in transliteration of
Pali/Sanskrit, where it is used to represent -ng or -m sounds.
(I'm quite sure there must be an equivalent character for it in Devanagari.)

I think that's why Sara Am is still preserved in TIS-620, and Nikhahit
is also allocated for its own function.

> Also, is U+0E40 a vowel that is written before the consoun it is attached
> to (even if it is pronounced after it) ? (that is a problem linked to
> tis-620 being a glyph-based encoding, that doesn't exist with the indic
> encodings (devanagari, etc))

The problem emerges only when we deal with text processing, such as word
boundary analysis or letter-to-sound conversion. But there is no ambiguity
in encoding.

> > Seems to be that, the way we type/write Thai words are not the same way as we
> > pronouce them. We write/type them in visual order with rule from Wtt but
> > we pronouce them like other Devanagari scripts....
> > Don't know why it's like that...
> 
> It is like that because it was easier like that: the only needed thing is
> a proportional font, no need to don any other modification to any program
> or OS (as long as they can display proportional fonts).
> That is because Thai script allows that by not having complex ligatured
> conjunts, but only "stacked" ones.
> The bad result is that typing may be odd at times, and ambiguities in encoding
> exist; but the good result is that use of Thai in general purpose computers
> has been possible since a lot of years; while other languages using indic
> scripts could only dream of it...

To me, the typing is not odd at all. Because it's consistent with the way
we write. We write from left to right, one by one character, although the
way we read is not necessarily the same. Just imagine how awkward it is to
reserve space for Sara E (U+0E40) first, then write a consonant, and then
move the pen back to write Sara E before moving further. That's not
natural indeed.

And so we have practiced to write from left to right. It must be odd
instead to practice a new way of spelling when turning to computer. I don't
think Thai people would accept the way Indic scripts do.

And, as I have said, I don't agree that there are ambiguities in encoding,
provided that Wtt specification is strictly abided by. But I agree that we
have problems with text processing.

-Theppitak.
Follow-Ups:
- Re: SaraAm
  - From: Pablo Saratxaga
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]