Re: open translations database

From: Pablo Saratxaga <pablo mandrakesoft com>
To: gnome-i18n gnome org
Subject: Re: open translations database
Date: Wed, 1 Nov 2000 18:28:48 +0100
Kaixo!

> > From: Stefan Rieken <StefanRieken@SoftHome.net>
> > Date: 12 Oct 2000 16:55:26 -0100
 
>> The current translation of open source software suffers from a 
>> lack of manpower. Thjs usually doesn't result in a lack of translations, 
>> but in bad translations. Half of the time translation engines such as 
>> Babelfish are being used.

Are you sure of that ?

If a translator is foudn builty of doing such an horror in total disrespect
for the users and the language he is supposed to help; then he must be
put in a blacklist and their contributions must be refused for ever.

I simply cannot believe that someone could do such a thing!


>> of small strings because of a lack of context (e.g.: the title 
>> of the window I am writing this message in says, directly translated 
>> back to English: "is composing a new message" instead of "Compose a new
>> message"). They also don't care about the size of the translated 
>> string, which can be important when used in a program.

That is not die to use of Babelfish or similar but to two reasons:
- lack of knowledge/understanding from the translator about the context
  or the meaning of a given string
- English!
  using English as the source is one of the worst things as it has so many
  homographs, as its conjugation is almost inexistent, etc.
  For example, what about "second" ? is it the time unit or "2d" ? In English
  it is the same word; in most other languages there are different words...

Possible solutions to that problems are:
- educating programmers about the problem; so they avoid too short strings,
  always hard to translate, so they avoid ambiguos wordings, and so they
  put /* comments */ to help translators grok the context and meaning (or
  any other special requirement; eg: size limitation in length).
- helping translators by providing them with a standadized set of "styleshhets"
  that all or most programs follow, so that the way to translate things are
  consistent (eg: use infinitive tense for verbs in menus and buttons for
  Spanish language); provide lexicons with context, eg not only how to
  translate a word; but how to translate it depending of context (eg some
  languages may find it desirable for various reasons, to use different
  translations for "Edit" depending on the thing to edit; or for "Help"
  if that is in a menu or in a button, etc).
  Another important data for coherence is to provide for each language 
  the standard hot keys to use (for menus buttons etc); those should be
  choosen in order to be logical with the words they are shortcuts for; but
  also having in mind to not use a same letter on two different entries
  that can appear in a menu (that is; the list of possible entries that
  can appear together in a menu has to be provided).

>> Translation by
>> individuals can often also cause errors. These vary from 
>> inconsistencies to overlooking spelling caveats common
>> for the target language.

Providing ispell dictionaires and such also will help (there is "pospell",
a front-end to ispell (or other spell checkers)) that allows spell checking
only the msgstr and not the msgid; a very usefull tool, I use it all the
time now.

>> It would be helpful to have one or more sets of standard 
>> translations
>> for standard words and strings. Translators of software would 
>> benefit from this, but also translators of larger documents that contain
>> standard words and strings (such as "radio button"; you'll be 
>> surprised to know how hard it is in some languages to come up with a good 
>> default translation for it).

Several languages already do similar things; what would be a good help 
would be to provide a single "start page" that links, for every language,
to all the existing ressources, provides in-depth tutorials (translated to
their language if possible), and various tools (eg: pospell, 
ispell dictionnary, gnopo, emacs po mode, vim po mode, etc.)

>> Because I want my solution to be global, and not e.g. 
>> Amiga-specific, I
>> think it is not a good idea to provide a procedure for the 
>> translation process. Different projects may have different standards.

And different O.S. and even different projects sometimes, have also different
terminology. It is not possible to be completly global and independent of
the project, much less of the O.S. (or family of OS at least; I think almost
all of the Unices can be viewed as a single OS, when used terminology is
counted).

>> Solution:
>> 
>> I was thinking that it would be nice to have a web-accessible 
>> database
>> being set up to tackle this problem. The "database" (or just a 
>> simple
>> file) would initially be empty, but it would be available for
>> modification through a CGI script. This service should be 
>> neutral, so
>> that we wouldn't get duplicate attempts to solve this global 
>> problem.
>> (E.g. hosting it at gnome.org wouldn't make it very neutral to 
>> KDE folks
>> ;-).

The domain linuxi18n.org can be used for that.
In fact it was the intention, when registerig it, to provide something very
similar to what you describe (plus, also, a sort of "translations market place"
where both programers/project managers and translators/translation groups
could meet each other (as one of the main problems of free software translation
is that most less known programs are not translated much simply because
translators don't know about them or how to get in touch with the programming
team; and the programmers don't know how to reach translators... until now
only very big projects (GNU, KDE, Gnome and Linux distributors for their
created tools) have been able to provide a large set of languages with a good
translation coverage; all other projects are almost nowhere, that is an
infrastructure problem that could be solved if a "translations market place"
could be created; with some people in charge of doing the transactions and
doing all the "adminsitrative work" (updating and merging po files, telling
translators when updates are needed, putting new translations in CVS or
sending to project managers); so programmers only need to know about one
adresse where to ask for translations and where to send the *.pot files;
and translators only need to know one adress where to look for untranslated
stuff and where to send back their work.

If that place can also provide material and docs to teach people how to better
do things (for programmers: how to add i18n support, the common errors to
avoid, etc. for translators: the database of terms you talk about, how
work the po files; how to write whe you need to change order of %-formatters;
what are the time formatters (%a %B etc) about,...); then it will only be
more valuable, and increase the quality of the overall result.

>> The Economy Scheme:
>> Simply feed the database a list of words and their translations, 
>> per language. This would be the scheme of preference if it turns out 
>> that my time, help and knowledge are really low.
>> 
>> The Business Scheme:
>> Same as above, but now with even more features! ;-), including:
>> 
>> - an argument-based history of the translation. Example:
>>  
>>   "English: 'file', Dutch: 'bestand'
>>    Previous translation 'bestant' is wrong because of a misspelling
>>    Previous translation 'document' is inaccurate"

Not with "previous translation"; but the database must be able to handle
one to many; as one English word may require *several* different translations
(and here is one big thing the database can help: detect such cases; 
and create a smaller database, targeted to programmers, that list the
problematic cases and words; so they can work around the problem (eg: like
prefixing the strings with some char that won't be displayed; eg:

 /* note: leave the leading "t_" untranslated.
  * here "seconds" is the time unit
  */
 printf( _("t_seconds") +2  );

...

 /* note: leave the leading "p_" untranslated.
  * here "seconds" is a group of things in 2nd position
  */
 printf( _("p_seconds") +2  );

 

Also, for the database to be really useful, a comment field must be present
for each entry, to give some context about the translation

>> - a project-specific translation. Example:
>> 
>>    "English: 'edit', Dutch:
>>   'Bewerken' (KDE standard)
>>   'Bewerk' (GNOME standard)"

The one-to-many + comment field will be able to handle such cases without
need for any extra work.

>> - per-project tips and guidelines. Example:
>> 
>>   "English: 'Are you sure you want to ...',
>>    KDE tip: doubting the user is not friendly. Please use 
>>   'Please confirm ...' instead."

That doesn't have to go in the database, but be provided by the different
projects; of course the central site (linuxi18n.org ?) could (and should)
link to them.

>> - per-language (and per-project?) tips. Example:
>> 
>>   "English: edit, Dutch: bewerk
>>   Dutch language tip (GNOME): always use infinitive[*]"

Project-neutral but language-specific tips have their place in the central
site; but not in the database, but in the docs and ressources for 
each particular language.

>> - automatic parsing of your .po files??
>> - automatic updating of a few registered .po files??

In my experience that doesn't provide much results worth the effort...
What is useful however is parsing for syntax errors and such (eg: msgfmt -c);
or re-ordering the entries so that untranslated and fuzzy entries go at
the top of the file (so it is easier for translators that use quite poor
editing tools).

>> But actually I've no idea if this would become a success. I know 
>> that I myself have only little time and resources, so I'd be happy 
>> already if I only managed to get the Economy scheme.

It requries the same ammount of work to have a database one-to-one
with key and translation only, that a database one-to-many; with
key, translation(s) and comment(s); so it is better to start with the right
one from the beginning. (you can leave the comment field empty and it is
the same as your "Economy" scheme; but it allwos to be improved by others
that will fille the comment field

>> I also never worked with .po
>> files and stuff. But I did do some CGI and Perl stuff recently, 
>> then again I can't say that I have a good cgi-bin place to put this. 
>> It would be really cool if folks could just file their (not too specific) 
>> .po or similar files into the system, and that the system automatically 
>> keeps these files translated and up to date.

You mean, translators having a secured session, then having an interface
similar to gnopo/gtranslator/ktranslator; only done trough the web ?
That could be usefull, indeed.
I had tought about that already, when I saw that some translators did they
work on Windows (because of connectivity contraints, they get the po files
at their work, and edited them there... of course, edit.exe doesn't provide
much power, nor checkspelling capabilities, etc). It would be quite usefull
and convenient for those people to be able to edit their files trough a web
interface, and the server performing the search of untranslated/fuzzy string,
syntax checking, and spell checking.

>> I would be delighted to form some kind of team, of course. It 
>> may also take some not-me expertise to support languages with different
>> alphabets.

UTF-8 should be used from the start.
With the possibility (and requirement for various projects) to convert
to the charset told in the charset= entry of the po header when dowlonading
or sending the file etc.
That in turn requires that the header of po files be checked for correctness;
and absolutely require its entries to be correct (charset, but also module
name and version (to know to which project/programmer to send it),
the last-translator entry (to know who to contact for updates).

>> So in fact, it will kind of depend on what you guys think of 
>> this idea. Can it succeed? Will it be popular?

I hope so.
I dream of it since some time already...

>> Will this system become a  standard
>> part of e.g. the rules for GNOME translation, if it works?

It must not be made mandatory to pass trough it; it must be a mean to help
people by taking from them all the burden not related to what they really
want to do (program/hack and translate, respectively), nothing more.

>> Do you feel like working on it? Do you have a good CGI space?

I have a domain name (and there is some web space for it too, however
I don't really know how many).
The mailing list and and the site (http://www.linuxi18n.org/) are quite
empty and stalled now; but it would be a good place to start (after all;
it was the initial idea behind linuxi18n.org)

Thanks for reading

-- 
Ki �a vos v�ye b�n,
Pablo Saratxaga

http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975
Follow-Ups:
- Re: open translations database
  - From: Stefan Rieken
References:
- open translations database
  - From: Aoife Dunne - Sunsoft ELC
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]