Re: XIM String Conversion Callback



Theppitak Karoonboonayanan wrote:

> On Fri, Jan 05, 2001 at 12:35:07AM +0800, Steve Underwood wrote:
> [...]

> > In experimenting with intelligent Chinese entry, I never really
> > found a way to make this kind of thing work well. Context derived directly
> > from the typists actions seems the only sort that really works well.
> > Deriving context from globs of text in languages without formal
> > punctuation is tough! Then again, maybe my methods were just not
> > intelligent enough.
>
> The input method should know well what it needs. And I think the problem
> is how to fullfill all the "reconversion" methods allowed by X.
> I have so little knowledge about Chinese or other languages.
> But for Thai, the only problem is the retrieval by word. And I think
> Pango has already defined an interface for this. Could it be used here?

The complexity in Chinese (and I believe Japanese and Korean, though I only
understand Chinese) is there is really no punctuation, or any reliable
delimiter. Text is a stream of characters, without any breaks. Modern Chinese is
often written with adapted European punctuation, but more often it isn't. A
Chinese character represents a single syllable, but there is no space, or other
delimiter, between groups of Hanzi which form a polysyllabic word. "Retrieval by
word" is not realistically implementable in an application. Much research has
gone into the automatic location of word boundaries in Chinese. Methods exist
which are very useful in intelligent IME's for analysing a stream of keystrokes,
but they aren't sufficiently reliable to be used to return a word if the IME
says to an application "give me a word".  Without formal punctuation even "give
me a sentance" is tough to do. In Chinese punctuation is spoken. Tonal changes,
which represent punctuation in spoken English, actually change the word in
Chinese. Chinese, therefore, uses words to represent punctuation (e.g. in
English a rising tone at the end of a sentance poses a question. A falling tone
makes a statement. A strongly falling tone says "I'm finished, your turn to
speak"). However, searching for these "punctuation" characters in a text stream,
and trying to treat them as solid hooks on the sentance boundaries, won't really
work. Things are a bit too grey for that.

If the IME just says "give me the last X characters and the next Y characters"
to the application, it will have a tough time making sense of what it gets.
Figuring out context from a stream of Chinese is difficult. Figuring it out from
a stream of entered keystrokes is much more reasonable. I have only succeeded in
getting effective context from the keystrokes. That's fine during text entry,
but rather limited when performing minor edits. Retrieved context from the app
would certainly be most useful, if you could make reliable sense of it.

Mixed language text probably makes the task even more interesting. Multiple
active IMEs for the various languages, endlessly switching probably makes
context a total nightmare. I have yet to see this problem tackled, except for
the most common case of English + one other language. English entry is pretty
trivial, so you don't get too much complexity from that mixture.

In general, adaptive IMEs are an abismal failure. They sound really neat, but
they do not permit touch-typing. Beginners find them nice, then get really
annoyed by them as they become experienced and cannot predict the precise
behaviour of the system when they look away from the screen. I think in the
context of performing minor edits this is not an issue. You never touch-type
those things, anyway.

I'm still not clear if this kind of access to data for reconversion is available
in XIM.

Regards,
Steve





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]