Re: One key stroke --> two code-points
- From: "Anousak Souphavanh" <anousak gmail com>
- To: "Simos Xenitellis" <simos lists googlemail com>
- Cc: Javier SOLA <javier khmeros info>, Bart Geesink <bart geesink org>, gtk-i18n-list gnome org, Jens Herden <jens khmeros info>, gnome-i18n gnome org
- Subject: Re: One key stroke --> two code-points
- Date: Tue, 10 Jun 2008 02:48:49 -0000
Thanks, Simos for your kind and time.
Much appreciated to Javier for brought a good solution indeed.
Lao input method is need a similar solution. Javier please post your
solution (where and how to define a new table for Khmer) so I can
define these code points for Lao.
Thanks,
Anousak
The Lao team
On Tue, Jun 10, 2008 at 1:58 AM, Simos Xenitellis
<simos lists googlemail com> wrote:
> O/H Javier SOLA έγραψε:
>>
>> Thanks Simos !!
>>
>> Actually, we have had these additions for a while in X11.
>
> Hi Javier,
>
> Checking at
> http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF-8
> does not show these lines at the end. It is possible that these compose
> sequences were added as a patch to the distribution package.
>>
>> We will do an issue for GTK+, and use the variable meanwhile.
>>
>> What file is it in GTK+? I have not been able to find it.
>
> In GTK+ (HEAD), the relevant file is
> http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimple.c?view=markup
>
> However, your case of compose sequences is different from the existing
> compose sequences, that result to a single codepoint (you require to produce
> two codepoints).
>
> Therefore, the type of support you are looking for is similar to compose
> sequences that result to letter+diacritic mark. Several languages have
> characters that no pre-composed letters exist, so the compose sequence
> produces letter+diacritic marks (more than one codepoint). Such support is
> missing, and there are already bug reports for them.
>
> Bug 341341 – Compose mechanism in simple input method doesn't support
> decomposed forms
> http://bugzilla.gnome.org/show_bug.cgi?id=341341
>
> Bug 345254 – dead accents should at least produce combining characters
> http://bugzilla.gnome.org/show_bug.cgi?id=345254
>
> There is a shortcut when trying to solve the above cases of compose
> sequences, thus the solution I expect to be different from the Khmer compose
> sequences.
> Specifically, for the Latin compose sequences, such as (it's a made up
> example)
>
> <dead_acute> <t> : "t́" # LETTER T WITH ACUTE
>
> one could convert to something like [ dead_acute, 't', 0].
> We would put 0 for the resulting codepoint because we can deduce for this
> category of compose sequences that the actual codepoints are 't' and 'acute'
> (the resulting codepoints match the body of the compose sequence).
>
> However, for the case of Khmer, the compose sequences look independent from
> the resulting code points. Therefore, a new table should be required.
>
> To cut the story short, I have filed a bug report for this,
> Bug 537457 – Support compose sequences that produce two+ codepoints
> http://bugzilla.gnome.org/show_bug.cgi?id=537457
>
> Simos
>
>>
>> Thanks,
>>
>> Javier
>>
>> Simos Xenitellis wrote
>>>
>>> O/H Javier SOLA έγραψε:
>>>>
>>>> Hi,
>>>>
>>>> I am working on Khmer localization (KhmerOS project).
>>>>
>>>> In Khmer, some of the basic vowels (which we include in the keyboard)
>>>> require two code-points, so one keystroke must generate two code points.
>>>>
>>>> It used to be that we could do the conversion in KBX by generating a
>>>> fictious code-point (Pablo Saratxaga explained this to us a few years ago),
>>>> which was later translated to two real code-points by puting the conversion
>>>> in the en-US locale file. I did work at the time.
>>>>
>>>> But now this seems to have stopped working. Does anybody knows how we
>>>> can fix this?
>>>
>>> These additions (pressing a single key and producing two codepoints), are
>>> located at
>>> /usr/share/X11/locale/en_US.UTF-8/Compose
>>> The specific lines appear to be
>>>
>>> # Khmer digraphs
>>> # A keystroke has to generate several characters, so they are defined
>>> # in this file
>>>
>>> <U17fb> : "ុះ"
>>> <U17fc> : "ុំ"
>>> <U17fd> : "េះ"
>>> <U17fe> : "ោះ"
>>> <U17ff> : "ាំ"
>>>
>>> GTK+ based applications duplicate the Compose file in the gtk+ library,
>>> and currently the version of the Compose file that exists in gtk+ does not
>>> include those specific compose sequences.
>>> I think these are a recent addition.
>>> Technically, it is possible for gtk+ to include compose sequences that
>>> produce more than one code points (requires small change in the code),
>>> however these recent Khmer digraphs are the only compose sequences using the
>>> facility now.
>>>
>>> To cut the long story short, you can bypass for now the GTK+ version of
>>> the Compose file and use the Compose file that comes with X.Org (shown
>>> above) by setting the environment variable GTK_IM_MODULE to "xim".
>>> This should not have adverse effect to the OLPC software.
>>>
>>> It is important that if other keyboard layouts as well require compose
>>> sequences that produce
>>> two or more codepoints (such as Serbian), to add them to the XOrg Compose
>>> file. In the next iteration of update of the GTK+, all these compose
>>> sequences can make it in.
>>>
>>> Simos
>>>
>>>
>>
>>
>
> _______________________________________________
> gnome-i18n mailing list
> gnome-i18n gnome org
> http://mail.gnome.org/mailman/listinfo/gnome-i18n
>
--
Anousak (Anthony) Souphavanh
"Small can make a big impact"
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]