Re: Concerning Keyboard Status Menu

From: Mike Qin <mikeandmore gmail com>
To: Debarshi Ray <rishi is lostca se>
Cc: desktop-devel-list gnome org
Subject: Re: Concerning Keyboard Status Menu
Date: Sat, 24 Nov 2012 19:48:49 -0500

On 24/11/12 06:40 PM, Debarshi Ray wrote:

The thing for Chinese input method is: Few of them are doing a good job.
Styling of Chinese, dialect, modern Chinese cultures idioms *varies*.
Even the big commercial input method failed to achieve a good job on
every aspect mentioned above. That's why you saw several of commercial
input method installed even on a single user desktop. This is why input
method tend to be inconsistent.

The default pinyin input GNOME whitelisted is ibus-pinyin. It's a very
basic input engine that doing a relatively poor job on almost every
aspect I mentioned above. And I'm not being offensive to those
developers, Sunpinyin is no better than that.

Develop a Chinese IME is *extremely* hard and it has commercial
barriers. Big search engine companies have much complete training
dataset than any opensource organization, commercial dictionaries from
Chinese internet media companies are covering every aspect of Chinese
culture: ancient poetry, modern word, idiom...Companies like Microsoft
and Google have a much more sophisticated Machine Learning Research
Group than any opensource organization...


The question is, if it is so hard to develop a Chinese IME, then why not
join together to improve it instead of having lots of half-finished ones?
If we are so low on resources then we should try to avoid fragmentation,
shouldn't we?

Good question! As a 20-year Chinese native speaker, I would say that'simpossible. This has never happen in the commercial input method world,and this is never going to happen in the opensource world either.

The situation of Chinese Language as well as input method is extremelycomplex. Workload of a complete universal input engine incredibly huge!

First. No one really know how to speak "Chinese". There are too manydialect. For instance my girlfriend is from Zhejiang and there will be anew dialect every 10km. Yes, these are new dialect, people speakdifferent dialect *could not* understand each other. Some of thesedialects have characters, say Cantonese, some of even cannot be fullyexpressed by Han character. (So that's why the Han character standardhas been extended several time.)

Second, ways of inputing Chinese is so different. Pinyin is one, itbasically encode the way Chinese are read. Besides Pinyin, there are atleast I (who always failed my Chinese exam) know Wubi, Shuangpin, Erbi,Zhengma. All of them are complex enough to implemented a individual engine.

Third, just pick Mandarin Pinyin as an example, because Han characterare not letter based, the problem of input method is basically the sameas Speech Recognition. Several sub-problems of this topic are highlyopen. For instance, natural language segmentation, dictionary mining,context inference... These problems are so open that no engine developeris sure that this way is the best way. In fact, we all encourage eachother to try new approach, because the current UX of opensource inputmethod is still way behind a commercial one that we use on Windows.

Fourth, patent issue. As I mentioned in the first email, patent arediscouraging open source input methods using commercial dictionaries.Because these dictionaries are either collected manually, or usingsophisticated Machine Learning techniques mining on massive dataset thatwe don't have.

As a result, there is no "universal" input engine for Chinese. But eachof the engines have its uniqueness. Take Mandarin Pinyin as an example:

* ibus-pinyin tend to be simple to hack, but provide poor UX since itdoes not consider language context. It's under GPL license.

* sunpinyin is more sophisticated, it uses 3-grams to overcome theNon-Markov property of Chinese. But still the dictionaries and thedatasets are a problem. And the LGPL license and its history thatoriginated from Sun Microsystem scared a lot of package maintainer away.

* libpinyin is considered the successor of sunpinyin, but under heavydevelopment. It's still considered as unstable now.

* rime sounds different, they seems to target at people who reallyappreciate the beauty of ancient Chinese. (Correct me if I'm wrong ofcourse)

As I said, each approach is a complete approach. They're *notfragmented*. We're not sure which one is the good idea, we're stilltrying to see which one is better. It feels pretty much like research,we all know every current approach sucks, and we're exploring differentways to make it better. If you focus on one of them, we lose the wholeopportunities to make it better.


Cheers,
Debarshi


--

Thanks
Mike

Follow-Ups:
- Re: Concerning Keyboard Status Menu
  - From: Ma Xiaojun

References:
- Re: Concerning Keyboard Status Menu
  - From: Marguerite Su
- Re: Concerning Keyboard Status Menu
  - From: Ma Xiaojun
- Re: Concerning Keyboard Status Menu
  - From: Giovanni Campagna
- Re: Concerning Keyboard Status Menu
  - From: Ma Xiaojun
- Re: Concerning Keyboard Status Menu
  - From: Mike Qin
- Re: Concerning Keyboard Status Menu
  - From: Debarshi Ray

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]