Re: [gtk-i18n-list] Determine the best encoding/script for a given text



Hi,

Although there are some characters that is used in
PRC only, or Taiwan only etc, to determine Traditional/
Simplified Chinese or Vietnamese, taking single codepoint
is not enough. Usually a string with finite length
should be used, but, still it is not definitive.
I suppose, glib designer doesn't want glib to include
non-definitive script-guessing algorithms, so, I
think using other libraries might be better.
Possibly fontconfig's script detection algorithm might
be something informative, although often I find posts
from Chinese people who complains unexpected results.

BTW, for Vietnamese script (Chu-Han & Chu-Nom), I'm
not sure if pre-Unicode encodings are used in popular.
For example, TCVN 5773:1993 looks like a characterset
of intersection between Chu-Han and (exisiting) CJK
Unified Ideographs in BMP, I think it is not good
characterset to use as a character encoding for Vietnamese
script.


Regards,
mpsuzuki

Gaurav Jain wrote:
Hi,

I need to find out the Script code for a given Unicode string.  I
found the API g_unichar_get_script() available in GLIB 2.10 which does
this, but this doesn't seem to have support for Chinese script.  For
e.g., is it possible to find out if the given character falls under
Traditional Chinese or Simplified Chinese code range?  Similarly for
Vietnamese?

Is there any other API available in GLIB that I can use to determine
the best encoding/script for a given text?

Thanks,
Gaurav
_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list gnome org
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]