Re: Industry Thai Cell-Clustering Rules



Hi Pablo,

] > 
] > This is the point.
] 
] It was my impression, and the message from Theppitak Karoonboonyanan
] confirmed it, that is not font specific at all.
] The font differences only affect the aesthetical aspect of the rendering
] (eg: nice placed tonemarks or "floating") but not the cell cluster order. 
] (uneless some fonts have real precombined glyphs and use glyph substitution;
] but Pango doesn't has OpenType support yet; and when it would have, then
] the font substitutions should overrride anything so it isn't really
] a problem, if I understand correctly)

This is correct.
>From now on, I will say that Cell-Clustering is not font-specific.


] 
] > Here is the piece of Thai pango engine codes which are for determining,
] > the Thai cell cluster.
] ... 
]  
] > This piece of code determines Cell-Clustering for Thai and, of couse,
] > it doesn't use Wtt2.0 Cell-clustering logic.
] 
] It should be modified to implement Wtt2.0

This is the whole point I have been trying to bring up Wtt2.0 Cell-clustering
and have it implemented in Thai Pango engine.


] 
] > That's why, we put Cell-clustering to XLFD name,
] 
] as it isn't font dependent, it doesn't make sense to put that into the font
] name.

I agree on this.

] 
] > It's not really clear cut to say that Cell-clustering is not specific to
] > font. Unfortunately, in the industry, there are more than 1 cell-clustering
] > rule.
] 
] but only Wtt2.0 is the official standard, so we should support only that.
] The fact that it allows to distinguish right and wrong order in typing is
] an important feature that well overrides the few extra complexity of the
] render.

That's the whole point that I have been trying to point out the most
important piece of Wtt2.0 cell-clustering.


] (note: latin typing with dead keys is a bit similar in that aspect:
] dead_circumflex + e will give ê, but e + dead_circumflex will give e^
] it is not exactly the same, as ê is one char while e^ are two chars; for
] Thai the same chars are involved; but the idea of visually seeing a wrong
] sequence is important, as only the use of the right sequences allow computer
] treatment of text (search, sorting, spell checking, etc))  

Again, that's the whole point and that's why I have been trying to tell
to have Wtt2.0 cell-clustering in Thai Pango engine.

] 
] > That's right. We should let users to choose which one they want.
] 
] I don't think that anymore; I think that only the official ne should be
] implemented; the existence of the others seems to be only the result of
] the incapabaility or unwillingness to implement Wtt2.0
] If however some other scheme has a real interest and users want it, it should
] be added; but there doesn't seem to be any evidence it would be the case.

That's correct.
I would say, the standard wtt2.0 should be in Thai Pango engine. Others,
I don't mind. That's why as long as there is the way to add them.

] 
] Yes.
] That is a probleme indeed.
] I think the cell clusters must really be considered as a block, and don't
] allow to insert or delete inside of it. That is, in the byte sequence:
] 
]  x x x x x A B C y y y y y
] 
] insertion should be possible only before A or after C.
] selection should include A B C or none of them, but never only one of them.
] deleting with backspace or del should start by deleting first C then B
] then A.
] 
] Is that a logical approach for a Thai person ?

Yes, it is and it has been like that.

] 
] 
] Note that this problem is not limited to Thai, but is also the same for
] all indic scripts, Lao, Khmer, etc. and also Arabic and Hebrew.
] (maybe also Korean hangeul ? I think the way it works in Korean is that once
] the conjunct is done, it has its own value and is selected/deleted simply
] as a singel char (same thing with latin/cyrillic/greek scripts; 
] eg: ê (e circumflex) is a single char; you cannot select/delete only the 'e'
] or 'only the '^')).

This is so true.


] 
] So the same approach is needed.
] 
] Note that the only difference here between Wtt2.0 and other cell clustering
] would be for wrong sequences: with other cell clusterings as they make a
] single cluster you treat them as normal clusters; with Wtt2.0 if you have a
] wrong sequence A C B (instead of right A B C) it will display as (I'll use '-'
] to link together cluster elements):   A-C B   so you can put the cursor
] between C and B, and insert things, or delete B (with Delete) or delete C 
] (with BackSpace). So to correct ACB into ABC you must do:
] 
] you have: A-C B
] place cursort (x): A-C x B
] press BackSpace: A x B
] Type B: A-B x B
] then C: A-B-C x B
] delete (with Delete) the extra B: A-B-C

This is correct.


] 
] while when using a non-Wtt2.0, first you won't see that there is an error :-(

Just emphasize that the bad point for non-wtt2.0 is that users don't
see ERROR. That's why I have been trying to point out that they need
to display ERROR for illegal sequence. The software shouldn't display them
in the way that users think that it's correct. This is very confusing.


] then to correct it you must do:
] 
] you have: A-C-B
] place cursor: A-C-B x  (or x A-C-B)
] delete B (with BackSpace): A-C x (or with Delete: x A-C)
] delete C (with BackSpace): A x (or with Delete: x A)
] type B: A-B x  (if you were at left of A, first press "right arrow") 
] type C: A-B-C x
] 
]
] 
] Note that indic scripts (devanagari, etc) (and Khmer I think) are inherently
] like Wtt2.0, as only correct order can produce the conjuncts (what you type
] is letters, not shapes).
] Lao, on the other hand, is like Thai it seems (but I would need confirmation),
] and would then need a similar clustering (that is; the clustering engine
] should not be part of Thai module, but of Pango; then the Thai/Lao modules
] call it and pass it a set of data with wich the endjine will be able to
] calculate the clusters.
] For Arabic and Hebrew, I think it would be better to threat them the same
] way as latin script: once the conjunct is created, consider it as a single
] char (that is made easier by the fact that the composed conjuncts have
] unicode values).
] 
] > Hope this would help.
] > 
] > 
] > Chookij V.
] 

Chookij V.


] -- 
] Ki ça vos våye bén,
] Pablo Saratxaga
] 
] http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975
] 
] _______________________________________________
] gtk-i18n-list mailing list
] gtk-i18n-list gnome org
] http://mail.gnome.org/mailman/listinfo/gtk-i18n-list





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]