open translations database



Dear Stephen & All

My name is Aoife Dunne and I am the project manager responsible 
for the GNOME Localisation at Sun.  

I am writing this mail in the hope that I can take Stephen's 
suggestions one step future helping the open source community in 
providing localised product versions of GNOME and similar open 
source products thereafter.  I work for Sun Microsystems who are 
planning on shipping GNOME with the next marketing release of 
Solaris, therefore I am writing this mail with the GNOME project 
in mind.   However, we want any solution to be for general benefit 
of free and open source software and I would be very interested in 
offering our team assistance across all localised open source 
software.

How can Sun help:
Stefan mentioned it would be nice to have a web-accessible  
"database" (or just a simple file) which would contain one or more 
set of standard English and associated translations for standard 
words/terms.  Develops and translators of software and 
documentation could use the terminology listings as reference.  

Terming Tool 
------------
We have a script, which extracts terms from the English software  
files, providing suitable terms for the initial database/file.  A 
term is defined as no more than one or two words.  This script 
extracts terms from the strings, removes duplications, ignores 
terms such as "the, is, numbers etc.".  It is not possible to 
extract the associated translated terms, so it would require 
translators to provide the translated terms.   Once this is done, 
the terminology listings can be posted to a web site, where it can 
be updated/modified as development of applications progress.  It 
is preferred that the suite of applications within a product use 
the same terminology ensuring consistency, however by defining the 
application it is possible to use different terms when 
appropriate.  

Sample
English Term	English Definition    Translated    Application
Print
Save
Save-To

Initially it may not be possible for me to supply the source of 
the terming tool due to licensing problems, however I can help 
immediately by supplying a simple text file with the English 
terms.  Would this be of help?


Translation Memory 
------------------
We are currently in the process of developing a translation memory 
(TM) system which runs on Unix.   How it works:  Basically, TM is 
all about recycling your previous translations in order to retain 
quality consistency, save time and money.  TM is based on a string 
whereas terming is based on terms consisting of maybe one or two 
words.  When as translation has been completed, the English and 
the associated translated .po files are run through a .po file 
parser which splits the files up into strings. The files are then 
run   through an alignment tool which generates files containing 
string pairs. Each string pair consists of an English string and 
its corresponding translated string. These aligned files are then 
imported into a database or translation memory (we are using, 
Oracle for this). When an updated version of the .po files comes 
along, the English files are run against the database using a 
translation memory tool. The TM tool searches the database for 
matches for each English string. If it finds an exact match (it 
always compares English against English), it inserts the 
corresponding translation into the file, leaving the translator to 
translate only what's new or what has changed. However, the 
translator still has the freedom to overwrite or correct a 
database translation if he/she so wishes. Generally the database 
is kept at a central location and is populated with new 
translations once the translation of a new application/product is 
complete. Obviously, it is necessary to monitor the quality of 
what goes into the database. Otherwise it's "garbage in, garbage 
out".

The TM system is still in development but is coming close to 
completion. We may be able to help by providing you with a .po 
file parser.  However, we would need to look into possible 
licensing issues.


Style Guides 
------------
We have some localised versions of a style guidelines.  These 
guidelines are used to aid the translators.  For example, in 
France  how the date, time formats should be localised.  In many 
countries such data is correct in many formats, however, the use 
of style guides decide on the preferred format for the use of 
consistency.   Our style guides could be used as reference and 
updated to create a GNOME specific style guide for all languages.  
Let me know if you are interested and I will send you a copy of 
our country specific style guides.


How else can Sun help,

* possible act a the host for the translation memory database, 
populating newly translated products. 

* provide linguistic quality assurance feedback and implement 
linguistic changes if necessary checking for grammar, spelling, 
inconsistencies etc.

If any of the above suggestions would be of help and if you have 
any other suggestion on what I can bring to the table, please let 
me know.  Looking forward to getting any feedback.

Best Regards
Aoife

> X-Unix-From: StefanRieken@SoftHome.net  Thu Oct 12 18:56:35 2000
> Delivered-To: gnome-i18n@gnome.org
> Subject: open translations database
> From: Stefan Rieken <StefanRieken@SoftHome.net>
> To: whampton@staffnet.com, gnome-i18n@gnome.org
> Date: 12 Oct 2000 16:55:26 -0100
> Mime-Version: 1.0
> X-BeenThere: gnome-i18n@gnome.org
> X-Loop: gnome-i18n@gnome.org
> X-Mailman-Version: 2.0beta5
> List-Id: Internationalization (I18N) of GNOME 
<gnome-i18n.gnome.org>
> 
> To the folks at openstandards.org and the gnome-i18n mailing 
list.
> 
> Hello,
> 
> This mail was sent out to give space to an idea that I developed 
only
> today. This idea is rough, unimplemented and untested. 
Nevertheless, I
> hope that it is of interest for you. This mail was sent to the 
addresses
> mentioned above, just because I didn't know any better place to 
start.
> If you believe I shouldn't have sent it to you or your list, I
> apologise. If you believe I missed someone out, you are free to 
forward
> this. (But I must warn you in advance that this idea is too 
young for me
> to know if it will survive my busy schedule.)
> 
> Problem:
> 
> The current translation of open source software suffers from a 
lack of
> manpower. Thjs usually doesn't result in a lack of translations, 
but in
> bad translations. Half of the time translation engines such as 
Babelfish
> are being used. These engines often can't produce correct 
translations
> of small strings because of a lack of context (e.g.: the title 
of the
> window I am writing this message in says, directly translated 
back to
> English: "is composing a new message" instead of "Compose a new
> message"). They also don't care about the size of the translated 
string,
> which can be important when used in a program. Translation by
> individuals can often also cause errors. These vary from 
inconsistencies
> to overlooking spelling caveats common for the target language.
> 
> It would be helpful to have one or more sets of standard 
translations
> for standard words and strings. Translators of software would 
benefit
> from this, but also translators of larger documents that contain
> standard words and strings (such as "radio button"; you'll be 
surprised
> to know how hard it is in some languages to come up with a good 
default
> translation for it).
> 
> Context:
> 
> I am writing this with the GNOME project in mind, because I am 
known
> with it. However, I want my solution to be for the general 
benefit of
> free and open source software.
> 
> There are a lot of standard strings in applications. Many GUI 
standards
> define which ones you can use. Desktop projects such as GNOME 
often have
> a set of these standard strings, and their translations, 
included. They
> can, however, not provide translations for less commonly 
strings.
> Another problem arises when standard strings are part of bigger 
strings
> (e.g. when "show toolbar" is standard, and a string like "show 
main
> toolbar" is being used). Most open source projects don't really 
care
> about documenting their use of standard strings, as the 
implementation
> should be clear enough.
> 
> In the past, I have done some minor translation work for ATO. 
This is an
> international organisation of translators of Amiga software (the 
Amiga
> Translation Organisation). They were pretty well organised (but 
being an
> Internet development newbie, it took me some time to get known 
with the
> organisation). One of the best parts of the organisation (of the 
Dutch
> division anyway), was a document that described the translation 
process,
> and also contained a list of common Amiga terms and their 
translations.
> 
> Because I want my solution to be global, and not e.g. 
Amiga-specific, I
> think it is not a good idea to provide a procedure for the 
translation
> process. Different projects may have different standards. I also 
don't
> think that a small list of common terms will do the trick. 
Again, these
> terms may vary slightly from one project to another, and if we 
are going
> to sum up only a few general words, the result wouldn't be 
really
> useful.
> 
> Solution:
> 
> I was thinking that it would be nice to have a web-accessible 
database
> being set up to tackle this problem. The "database" (or just a 
simple
> file) would initially be empty, but it would be available for
> modification through a CGI script. This service should be 
neutral, so
> that we wouldn't get duplicate attempts to solve this global 
problem.
> (E.g. hosting it at gnome.org wouldn't make it very neutral to 
KDE folks
> ;-).
> 
> The interesting part is how the database should look and behave. 
I only
> have given this part little attention as of yet. There are, 
however, a
> few schemes one could follow, and I imagine that one of these 
schemes
> would be more or less ideal.
> 
> The Economy Scheme:
> Simply feed the database a list of words and their translations, 
per
> language. This would be the scheme of preference if it turns out 
that my
> time, help and knowledge are really low.
> 
> The Business Scheme:
> Same as above, but now with even more features! ;-), including:
> 
> - an argument-based history of the translation. Example:
>  
>   "English: 'file', Dutch: 'bestand'
>    Previous translation 'bestant' is wrong because of a 
misspelling
>    Previous translation 'document' is inaccurate"
> 
> - a project-specific translation. Example:
> 
>    "English: 'edit', Dutch:
>   'Bewerken' (KDE standard)
>   'Bewerk' (GNOME standard)"
> 
> - per-project tips and guidelines. Example:
> 
>   "English: 'Are you sure you want to ...',
>    KDE tip: doubting the user is not friendly. Please use 
'Please
> confirm ...' instead."
> 
> - per-language (and per-project?) tips. Example:
> 
>   "English: edit, Dutch: bewerk
>   Dutch language tip (GNOME): always use infinitive[*]"
> 
> - automatic parsing of your .po files??
> - automatic updating of a few registered .po files??
> 
> So this is my plan for a "translation bazaar". As said, the idea 
is that
> it is empty at start, and then maybe someone would dump a few 
GNOME and
> KDE .po files into this database, and the initial revision 
process can
> kick off. But the real idea is that folks supply their own 
strings they
> want to have translated, and the database would slowly get 
filled, while
> translations grow to be more accurate over time because of 
revisions.
> 
> But actually I've no idea if this would become a success. I know 
that I
> myself have only little time and resources, so I'd be happy 
already if I
> only managed to get the Economy scheme. I also never worked with 
.po
> files and stuff. But I did do some CGI and Perl stuff recently, 
then
> again I can't say that I have a good cgi-bin place to put this. 
It would
> be really cool if folks could just file their (not too specific) 
.po or
> similar files into the system, and that the system automatically 
keeps
> these files translated and up to date. But as said, I don't know
> anything of this .po stuff, so that really is beyond my 
potential. But
> if someone thinks "yeah, this is a really neat idea, and I can 
do it!",
> I would be delighted to form some kind of team, of course. It 
may also
> take some not-me expertise to support languages with different
> alphabets.
> 
> So in fact, it will kind of depend on what you guys think of 
this idea.
> Can it succeed? Will it be popular? Will this system become a 
standard
> part of e.g. the rules for GNOME translation, if it works? Do 
you feel
> like working on it? Do you have a good CGI space?
> 
> I must say, I don't know if this is a good idea, or if it is 
only a nice
> theory with no practical value. So I really look forward to any
> feedback.
> 
> Greets,
> 
> Stefan
> 
> [*] I'm not sure if this is the correct term because it's been a 
while
> since I had to learn it. But the problem Dutch translators have 
to face
> is that in English, in "I edit", the word "edit" is the same as 
in "to
> edit" and "you edit", while in Dutch it is not. So when 
translating to
> Dutch, you need to know which one to choose.
> 
> 
> _______________________________________________
> gnome-i18n mailing list
> gnome-i18n@gnome.org
> http://mail.gnome.org/mailman/listinfo/gnome-i18n

Aoife Dunne
Program Manager
European Localisation Centre
Sun Microsystems Ireland Ltd
Hamilton House
East Point Business Park
Dublin 3
Ireland
Tel.:  	+353-1-8199-266
Fax:.	+353-1-8199-261
Email:	aoife.dunne@Ireland.Sun.COM






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]