Re: minutes from GUADEC2 translation BOF

From: Keld Jørn Simonsen <keld dkuug dk>
To: Gediminas Paulauskas <menesis delfi lt>
Cc: Keld Jørn Simonsen <keld dkuug dk>,gnome-i18n gnome org
Subject: Re: minutes from GUADEC2 translation BOF
Date: Sun, 22 Apr 2001 18:13:59 +0200
On Sat, Apr 21, 2001 at 11:14:46PM +0200, Gediminas Paulauskas wrote:
> More comments from me
> 
> On 21 Apr 2001 11:54:18 +0200, Keld Jørn Simonsen wrote:
> > We agreed that the following was a good idea, and recommend
> > it for inclusion in the gettext tools:
> > 
> > In the .po file for each message, list     
> > - when spellchecked
> > - when and whom reviewed
> > - each line verified in run, when and by whom
> > 
> > Implement it as .po comment lines:
> > #, spellchecked 2001-04-05 keld@dkuug.dk
> > #, reviewed 2001-04-06 keld@dkuug.dk
> > #, verified date email
> > 
> > The gettext tools should then remove these lines when
> > the mesgid is changed.
> > 
> > We would like a windows manager for verifying the executed apps.
> > Running the program and clicking a text or a whole
> > page should automatically mark it as ok or wrong in the .po file.
> > We thought that it would be next to impossible to have people
> > do this by hand, so we need assistance from a general 
> > windows manager.
> 
> It seems like adding a huge overhead to translation files and tools.
> 
> #, spellchecked 2001-04-13 jcastro@vialink.com.br
> #, reviewed 2001-04-14 jcastro@vialink.com.br
> #, verified 2001-04-15 another@person.linux.br
> #: libgnomeui/gnome-stock.c:840 libgnomeui/stock_demo.c:144
> msgid "Close"
> msgstr "Uzdaryti"
> 
> Looks way too verbose!
> 
> [menesis@~/cvs/gnumeric/po]$ du -h --total *.po | grep total
> 15M total
> 
> I do not feel like wanting to add any extra info to po files.

15 M is not much. I did the same for my danish .po file,s and
I have about 5 Mb da.po files. Times 50 languages, that is 250 Mb
.op files in the Gnome Cvs. I have about 35.000 messages in those 
5 Mb data that is about 150 byte per message. Your example - which
I agree could be typical, would add about 120 bytes to that. 
Is this worth the price? In terms of space in the Gnome CVS, this
is maybe 500 Mb more, which is neglible. In terms of translators 
space it is some more space, relatively, but in these days 500 Mb
is really not much. A 30 Gb disk is around USD 150. 500 Mb is
then around USD 15. Times 300 translators, that USD 5000 for 
hardware for QA. This is close to worst case, assuming that 
everything is translated and QA-ed for all 50 languages. 

Furthermore we have things in place so that you need not
have all sources and .po files on your own machine to do
translations, you can just pick up the latest merged version
of the specific .po files yu want to translate, reducing 
transmission time and disk space usage, plus processing time
drastically for translators.

> Adding spellchecked/reviewed date to header comment is acceptable. Then
> after some time the review "expires". Don't know if this is enough for
> good QA. But that proposed scheme is too expensive for public po files.
> Source archives then will grow twice!

I agree on your estimates, as noted above. We could shorten the
above comment lines, throw away the dates, and shorten the keywords
to eg vrfy rew spck. Then the QA overhead would be maybe 40 chars
in your case above, which would be a 33 % overhead on a full scale
scenario.

Is it worth it? I think QA is a very important aspect of our work.
Some people here in Denmark are saying that our translations are ridiculous,
although the translation team thinks we are doing a good job. 
It is very har to add that extra quality, and I believe the remedies that
was discussed above gets a long way to help the QA. I think we do need
QA on each individual message. The review will quickly detoriate
when new messages gets in, Spellchecking could probably be done 
by one header line, I normally spellcheck the whole po file at once;
and the verifying really needs to be done per message, so you also
can spot which messages you never reached.

An idea I got for the spellchecking: There are often technical words,
names  and abbreviations in a .po file, that are not in the general 
spellchecking directionaries (and should not be there), maybe they
could be in a special section of the .po file. Or maybe it should
be in a specific file, but then it is easier to maintain stuff
if they are in one place alltogether.



> > If the error is one in translation, then the message should
> > be sent to the translator, or the translator team.
> > The email addres to report translation errors should be listed
> > in the "about" box for all programs, and integrated in the
> > bug reporting system, bug-buddy.
> 
> What is a problem with that? I remember this was discussed twice, but no
> patches submitted, and still there is no info about translator in about
> boxes :(
>
Yes, we just need to do it. Is there more to it than for each program
to add a translators box, and native language error report box?
And then ask the developers to put that in?

> > It was proposed to give every message a message number.
> > I dont think we agreed on that.
> 
> Every message can be given an ID, not a simple number, but something
> like md5sum or a function used in string hashes. If message changes, the
> ID does not match the translation. So that message needs review. In
> *separate* file, which is not distributed, but is used only in language
> group, IDs are mapped to spellchecked/reviewed info.


Yes, that could be done. What I think was not agreed on, was to display the
error number to the user. I remember IBM doing that on mainframes.
But it would look too technical to a normal user, and frighten them, IMHO.

> > 3. Use of translation memory
> > 
> > maybe do it automatically on CVS derived files?
> > KDE has a base of text that is translated, such as 
> > "yes, no, cancel". We could centrally build total
> > .po files of messages. Some of the sofware here is
> > real s-l-o-w tho.
> > 
> > 4. Use of machine translation software
> > 
> > There are some machine translation software around, we heard
> > of one person (Antonio?) from the gnome i18n crowd that was working on
> > a translation system with esparanto as an intermediate
> > language. Please tell us more.
> 
> Remember someone complained that GNOME translations are so bad, someone
> should be using babelfish to translate stuff? It sounds bad. Transaltion
> should be machine-assisted, but still needs human interaction to review
> everything. Translators of course use tools fr that, at least msgmerge
> to get fuzzy translations. This is not enough of course. But cron job on
> server doing translations is not acceptable.

Yes I would always think that we need a human touch on translations.
But many times, when I sit in the wee hours and translate, I feel
like a little machine. My brain is almost off, and much of the stuff
could as well be done by Machine Translation. All MT messages 
should of cause be marked fuzzy. The cron job on a server would
also only be an option, translators could chose not to use it.
But as I see it it could be a good help to translators.
This is vaporware, tho.

Keld
References:
- minutes from GUADEC2 translation BOF
  - From: Keld Jørn Simonsen
- Re: minutes from GUADEC2 translation BOF
  - From: Gediminas Paulauskas
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]