SaraAm



Hi Pablo,

] Date: Fri, 10 Nov 2000 22:38:01 +0100
] From: Pablo Saratxaga <pablo mandrakesoft com>
] 
] I wasn't talking about shaping or clustering; but about input and text
] encoding standardization.
] How is the word "water" supposed to be written ?

"water" will be typed/input by the following keys.

Key "NoNu" (U+0E19) + Key "MaiTho" (U+0E49) + Key "SaraAm" (U+0E33)

If we type "Key Nikhahit (U+0E4D)" and "Key SaraAA (U+0E32)" in stead of
one "Key SaraAm (U+0E33)", the display of word "water" will be like this.

	MaiTho (U+0E4D)     Nikhahit (U+0E4D)
	NoNu (U+0E19)                           SaraAA (U+0E32)
	
This won't be read as "water" though. It's not the right word.

] 
] > In the Thai computer industry, for those window applications which use
] > "I-BEAM" type of cursor, like desktop appls, the way they treat SaraAm,
] > they treat it as shown below.
] > 
] > 	(1) Consonance + SaraAm            ----> 1 cluster (2 columns)
] > 	(2) Consonance + Tonemark + SaraAm ----> 1 cluster (2 columns)
] > 	
] > The SaraAm will be displayed as 2 pieces which are Nikhahit (U+0E4D) and
] > SaraAA (U+0E32) if it's following the above sequences, (1) and (2).
] 
] Which makes sense.
] 
] So my question is; should the input standardize SaraAm -> Nikhait + SaraAA
] and SaraAm + tonemark --> Nikhait + tonemark + SaraAm ?

As for the input point of view, even, it seems to be that SaraAm can be
composed as Nikhahit and SaraAA but as the example above, seem to show that,
the word "water" would be only one sequence which uses SaraAm but it won't
be the sequence, neither Nonu + Nikhahit + MaiTho + SaraAA nor
Nonu + Maitho + Nikhahit + SaraAA. But the following sample would problably
be the same meaning.

The word "black",


	(1) DoDek (U+0E14) + SaraAm (U+0E33)
	(2) DoDek (U+0E14) + Nikhahit (U+0E4D) + SaraAA (U+0E32)
	
	(1) and (2) are displayed the same but different sequences.
	
	For the input point of view,
	
	(A) Key DoDek (U+0E14),  Key SaraAm (U+0E33)
	(B) Key DoDek (U+0E14),  Key Nikhahit (U+0E4D), Key SaraAA (U+0E32)
	
	(A) and (B) are considered as the same word "black".
	
As to answer your questions ....,

(A) input standardize for SaraAm --> Nikhahit + SaraAA

    Should not work for all the cases, like the sample of word "water"
    
(B) input standardize for SaraAm + Tonemark --> Nikhahit + Tonemark + SaraAA

    Here is what I guess from what you are thinking.
    
    As I remember that I use to say before that, writting in Thai,
    vowel will always follow consonance and tonemark will always follow vowel.
    
    This is true for vowels like U+0E30 - U+0E39 (U+0E33 SaraAm might be
    the exception) and tonemark like U+0E48 - U+0E4B.
    
    But, there are other Thai vowels which are composed from these vowels.
    I would say that they would be called "compound vowel", like SaraAm,
    and a lot more, for example,
    
    	U+0E40 + (consonance) + U+0E32 ====> This is composing one compound
    					     vowel.
    					     
    For these compound vowels, we can put those tonemarks in it, for example,
    
    the word "nine" which uses the above compound vowel, with the consonance
    "KoKai" and Tonemark "MaiTho" U+0E49, so, here is what it looks like...
    
    logical buffer:  U+0E40, Kokai (U+0E01), MaiTho (U+0E49), U+0E32
                                             ===============
                                             Tonemark
                                                     
    display:
                        MaiTho
    		U+0E40  Kokai  U+0E32
    		
    As you can see, that for the compound vowel like this one, Tonemark is
    not at the end of the sequence but after consonant.

    Let's see your question, SaraAm + Tonemark --> Nikhahit + Tonemark + SaraAA
    
    May be, SaraAm should be considered as this form if it's going to be used
    with tonemark.
    
    		(consonance) Tonemark SaraAm
    		
    Seems like SaraAm would look like Compound vowel but I don't really know
    whether it's or not but the above is the pattern.
    
    So, your question is the other way around, SaraAm + Tonemark, then,
    wtt says Tonemark will need to be displayed as the next cell. So,...
    
    SaraAm + Tonemark shouldn't be the same input as Nikhahit + Tonemarkd + SaraAA
    
    Don't know if this can help or not....
    
Seems to be that, the way we type/write Thai words are not the same way as we
pronouce them. We write/type them in visual order with rule from Wtt but
we pronouce them like other Devanagari scripts.... Don't know why it's like that...
Might need to learn about the history of the language, I guess....

Chookij V.

   	 
    	 
] 
] > When I was a kid, here is what they taught me how to write SaraAm.
] 
] I was not referring to handwritting; but to computer writting; as for
] a computer different bytes don't match, even if the visual output is the same
] 
] > Don't know who invented SaraAm. Not quite sure if it was from computer
] > person who tried to have Thai support in the computer. Then,
] 
] From what I read, it was on old mechanical typewritters; then when computers
] started to be used, the keyboard was copied from mechanical typewritters
] and a SaraAm key being there, it needed a code in the charset encoding.
] It would have been better avoiding it imho.
] 
] -- 
] Ki ça vos våye bén,
] Pablo Saratxaga
] 
] http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975
] 
] _______________________________________________
] gtk-i18n-list mailing list
] gtk-i18n-list gnome org
] http://mail.gnome.org/mailman/listinfo/gtk-i18n-list





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]