Re: [Tracker] Additions to Tracker Ontology for Chinese Use Cases



Here are some examples showing "name in Chinese" -> "name in pinyin" -> "T9 representation":

åé  ->  Liu Jing  ->  5485464
çå  ->  Luo Xuan  ->  5869826
éèè  ->  Ma Fang Fang  -> 6232643264

The first use case is that it is apparently very common for phone users in China to look up names in 
their contact list using the T9 dial pad. So in data from above, if the user clicks '5' on the dial pad, 
they want to perform a search using Tracker to get the results "Liu Jing" and "Luo Xuan". When the user 
then clicks the '8' on the dial pad Tracker would be searched again and only return "Luo Xuan".

My customer also wants to support searching for Chinese contacts using their Pinyin names. So if the user 
types "L" using the virtual or physical keyboard on the phone, again both "Liu Jing" and "Luo Xuan" 
should be returned. If the user adds an "i", another Tracker search will be performed which should return 
only "Liu Jing".

The other important use for the Pinyin name is for sorting the contacts list. My customer wants to sort 
Chinese names among English names using their Pinyin representation. So if the English Names "Matt 
Compton" and "John Doe" are added to the list of Chinese names above, the list of names should be sorted 
alphabetically as follows:

Please consider that when using libicu for unicode support, it will
allow setting "pinyin" as a collation specifier in the locale being used
by Tracker. This should, in theory (didn't test it), allow ordering the
results of the queries based on the pinyin representation of the input
string. That should also allow looking for Chinese contacts using the
Pinyin representation of the original chinese name. Anyway, I'm not sure
how the T9 representation would work here. You could give this a try,
because it may be enough for your needs (I might be wrong, of course).


Oh, forgot to paste the link of ICU locale information:
http://userguide.icu-project.org/locale

And the UTS#35, which shows which are the possible collation specifiers
suggested by Unicode:
http://www.unicode.org/reports/tr35


-- 
Aleksander




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]