Re: "Automatic" UI translations showing word translations along with original



Tim Foster wrote:

Hey all,

Promise, this will be the last I'll write on this for a while - but
people suggested I look at bi-gram, tri-gram and up to 5-gram
distributions of words found in GNOME software messages. That is, given
the input message :

"This is a long sentence so there."

We'd get the tri-grams :

This is a
is a long
a long sentence
long sentence so
sentence so there

- usually people mess about with this sort of thing when trying to build
terminology lists, but I suspect my sample set is a bit small to be
interesting.

Regardless, I've got results at http://blogs.sun.com/roller/page/timf?entry=more_word_bagging


I hope there is more to this work. I believe there should be some aid to the localisation process.
I went through the initial wordlist, starting with the 1200 most common words, and removed those words
that are either elemental (articles, connectict words, adverbs, untranslated words like FTP, etc).
The resulting list has 1000 words and the file is available at
http://www.isg.rhul.ac.uk/~simos/misc/gnome-2.10-words--reduced.sxc
The first sheet is that by Tim, the second one is the reduced one.


Some stats on the 1000 word list:
- The most common term is "file", with 1200+ occurences.
- The top five words are
file
image
name
color
window

- "right" and "left" have same number of occurences, good! (184)
- In the list of 1000 most popular words, the last one is "suspend", with 18 occurences.
- The full list has 10.000 words in a total of 34.000 messages in GNOME 2.12.


Hope this helps,
Simos


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]