Re: One key stroke --> two code-points
- From: "Anousak Souphavanh" <anousak gmail com>
- To: "Simos Xenitellis" <simos lists googlemail com>
- Cc: Javier SOLA <javier khmeros info>, Bart Geesink <bart geesink org>, gtk-i18n-list gnome org, Jens Herden <jens khmeros info>, gnome-i18n gnome org
- Subject: Re: One key stroke --> two code-points
- Date: Tue, 10 Jun 2008 12:05:53 +0700
Thanks, Simos for your kind and time.
Much appreciated to Javier for brought a good solution indeed.
Lao input method is need a similar solution. Javier please post your
solution (where and how to define a new table for Khmer) so I can
define these code points for Lao.
Thanks,
Anousak
The Lao team
> On Tue, Jun 10, 2008 at 1:58 AM, Simos Xenitellis
> <simos lists googlemail com> wrote:
>> O/H Javier SOLA έγραψε:
>>>
>>> Thanks Simos !!
>>>
>>> Actually, we have had these additions for a while in X11.
>>
>> Hi Javier,
>>
>> Checking at
>> http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF-8
>> does not show these lines at the end. It is possible that these compose
>> sequences were added as a patch to the distribution package.
>>>
>>> We will do an issue for GTK+, and use the variable meanwhile.
>>>
>>> What file is it in GTK+? I have not been able to find it.
>>
>> In GTK+ (HEAD), the relevant file is
>> http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimple.c?view=markup
>>
>> However, your case of compose sequences is different from the existing
>> compose sequences, that result to a single codepoint (you require to produce
>> two codepoints).
>>
>> Therefore, the type of support you are looking for is similar to compose
>> sequences that result to letter+diacritic mark. Several languages have
>> characters that no pre-composed letters exist, so the compose sequence
>> produces letter+diacritic marks (more than one codepoint). Such support is
>> missing, and there are already bug reports for them.
>>
>> Bug 341341 – Compose mechanism in simple input method doesn't support
>> decomposed forms
>> http://bugzilla.gnome.org/show_bug.cgi?id=341341
>>
>> Bug 345254 – dead accents should at least produce combining characters
>> http://bugzilla.gnome.org/show_bug.cgi?id=345254
>>
>> There is a shortcut when trying to solve the above cases of compose
>> sequences, thus the solution I expect to be different from the Khmer compose
>> sequences.
>> Specifically, for the Latin compose sequences, such as (it's a made up
>> example)
>>
>> <dead_acute> <t> : "t́" # LETTER T WITH ACUTE
>>
>> one could convert to something like [ dead_acute, 't', 0].
>> We would put 0 for the resulting codepoint because we can deduce for this
>> category of compose sequences that the actual codepoints are 't' and 'acute'
>> (the resulting codepoints match the body of the compose sequence).
>>
>> However, for the case of Khmer, the compose sequences look independent from
>> the resulting code points. Therefore, a new table should be required.
>>
>> To cut the story short, I have filed a bug report for this,
>> Bug 537457 – Support compose sequences that produce two+ codepoints
>> http://bugzilla.gnome.org/show_bug.cgi?id=537457
>>
>> Simos
>>
>>>
>>> Thanks,
>>>
>>> Javier
>>>
>>> Simos Xenitellis wrote
>>>>
>>>> O/H Javier SOLA έγραψε:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am working on Khmer localization (KhmerOS project).
>>>>>
>>>>> In Khmer, some of the basic vowels (which we include in the keyboard)
>>>>> require two code-points, so one keystroke must generate two code points.
>>>>>
>>>>> It used to be that we could do the conversion in KBX by generating a
>>>>> fictious code-point (Pablo Saratxaga explained this to us a few years ago),
>>>>> which was later translated to two real code-points by puting the conversion
>>>>> in the en-US locale file. I did work at the time.
>>>>>
>>>>> But now this seems to have stopped working. Does anybody knows how we
>>>>> can fix this?
>>>>
>>>> These additions (pressing a single key and producing two codepoints), are
>>>> located at
>>>> /usr/share/X11/locale/en_US.UTF-8/Compose
>>>> The specific lines appear to be
>>>>
>>>> # Khmer digraphs
>>>> # A keystroke has to generate several characters, so they are defined
>>>> # in this file
>>>>
>>>> <U17fb> : "ុះ"
>>>> <U17fc> : "ុំ"
>>>> <U17fd> : "េះ"
>>>> <U17fe> : "ោះ"
>>>> <U17ff> : "ាំ"
>>>>
>>>> GTK+ based applications duplicate the Compose file in the gtk+ library,
>>>> and currently the version of the Compose file that exists in gtk+ does not
>>>> include those specific compose sequences.
>>>> I think these are a recent addition.
>>>> Technically, it is possible for gtk+ to include compose sequences that
>>>> produce more than one code points (requires small change in the code),
>>>> however these recent Khmer digraphs are the only compose sequences using the
>>>> facility now.
>>>>
>>>> To cut the long story short, you can bypass for now the GTK+ version of
>>>> the Compose file and use the Compose file that comes with X.Org (shown
>>>> above) by setting the environment variable GTK_IM_MODULE to "xim".
>>>> This should not have adverse effect to the OLPC software.
>>>>
>>>> It is important that if other keyboard layouts as well require compose
>>>> sequences that produce
>>>> two or more codepoints (such as Serbian), to add them to the XOrg Compose
>>>> file. In the next iteration of update of the GTK+, all these compose
>>>> sequences can make it in.
>>>>
>>>> Simos
>>>>
>>>>
>>>
>>>
>>
>> _______________________________________________
>> gnome-i18n mailing list
>> gnome-i18n gnome org
>> http://mail.gnome.org/mailman/listinfo/gnome-i18n
>>
>
>
>
> --
> Anousak (Anthony) Souphavanh
> "Small can make a big impact"
>
--
Anousak (Anthony) Souphavanh
"Small can make a big impact"
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]