Re: [g-a-devel]Re: gok 0.9.3 released - keyboards now i18n-ready



Hi, 
sorry for late response.

Bill Haneman <Bill Haneman Sun COM> writes:
> To follow up, for the time being there are some things you can help
> with, regarding GOK's serbian translation (in addition to just
> translating the strings from the po files):
>
> * we need a serbian wordlist in UTF-8; one-word-per-line with 
>   no additional text is a legal format, but you could optionally
>   put a frequency value as the second token in each line.
>   In most cases we can read ispell files, but we assume ISO-latin-1
>   encoding for non-UTF8 files, which may not work for Serbian, UTF-8 
>   is better.
>
>   If you are using an existing 'free' wordlist it might be better to 
>   document the location in gok's README and use gconftool-2 to set
>   /apps/gok/aux_dictionaries to include that file.

I'll have to take some time to work on it, since there's actually no
any existing list of Serbian words.

There are many texts around the web in Serbian language, so I'll see
if I can make use of them (copyright claims should not be a problem
here, since this is qualified as "language corpus", I believe). This
will have the added benefit of being able to find out the word
frequency as well.

Since I've never done that (I'm neither a linguist nor do I know much
about a11y), any tips on how does one create a list from long texts,
and what should be taken into account are welcome.

> * at-spi needs to be patched to allow use of word completion for
> serbian.  You can help by building at-spi with 'DEBUG', running it in an
> xterm, and helping figure out which X keysyms are missing; GOK's word
> completion relies on at-spi's ability to synthesize whole UTF-8 strings,
> and at-spi's string synthesis routine is currently a messy hack that
> doesn't include all the necessary keysyms.  
>
> You can help us identify the keysyms that are relevant and currently
> omitted, and I can help patch at-spi so that word completion works for
> Serbian.

Yes, I'll look into this later today. Since I've already developed
XKB keyboard map for Serbian language, this shouldn't be much of a
problem.

Also, can you please give me any tips on how to perform these tests:
what settings do I need to turn on (I guess accessibility should be
turned on :), since I'm planning on testing this in separate and clean
environment.

> _IF_ you don't need word completion, and can get by with only one
> keymap, then I think GOK should work OK for Serbian if you provide the
> necessary translation strings.  I realize this leaves some holes from
> the user perspective, but we'll try to fill in them in as time permits.

I think word list would be important. But, since Serbian is quite a
complicated language, I already see one big problem with word
syntetic completion.

Words in Serbian change depending on their usage, and change usually
happens at the end of word. So, for instance, I might say
  Ja pricam (I talk)
  Ti pricas (You talk)
          ^ -- in this example, only the last letter is changed, but
               there are many more complicated examples

So, on this topic I wonder what's the current practice in producing
word lists for other languages with similar constructions? Do they
include all the word forms, or they include just the most common word
form?

I'll reply to the other message as well.

Cheers,
Danilo



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]