Re: PO-based Documentation Translation

From: Tim Foster <Tim Foster Sun COM>
To: Keld Jørn Simonsen <keld dkuug dk>
Cc: Ismael Olea <ismael olea org>, Danilo Segan <dsegan gmx net>,"gnome-i18n gnome org" <gnome-i18n gnome org>
Subject: Re: PO-based Documentation Translation
Date: Wed, 01 Oct 2003 16:16:55 +0100

Hi Keld,

On Wed, 2003-10-01 at 15:37, Keld JÃ¸rn Simonsen wrote:
> On Wed, Oct 01, 2003 at 02:44:16PM +0100, Tim Foster wrote:
[snip]
>  What's your point ?
> 
> My point is that GTP is the "real world" - while Sun systems
> are niche products at least in the workstation market AFAICT

Right, whatever...

>  I was just provoked
> by somebody indicating that GTP was not the "real world". GTP performs - 
> in myriads of languages every day - on tens of millions computers all
> over the world.

Not at all, as I've mentioned before, I'm amazed at the number of
languages GNOME is translated into - you guys are doing an excellent
job, I'm trying to help out with some new technology that in my view is
streets ahead of what you're using now. We're not in the translation
business, so we have no reason to keep this stuff to ourselves.

> I have also asked for refinement for QA for GTP and asked if some
> industry prople like Sun could explain us about the techniques they
> employ.

I think it would probably boil down to three things for us :

a) hard cash, to buy -
b) - trained linguists who randomly review translations on projects

- if a vendor consistently does bad translation, then we don't use them
on future projects. In an environment where you pay a company to
translate software, it's in their best interest to do a good job. I
guess things are different in open source, so my next point would
probably be more useful to you 

we're starting to get now with automatic translation :

c) a centralised translation memory system that can help provide
consistency of translation across products.

> > Sun, Oracle, Novell, Microsoft, IBM and several translation companies
> > saw the need for XLIFF and they wouldn't have invested in it if any
> > pre-existing format was going to cut it.
> 
> And what are your experiences? Which products have used these
> technologies, and to what extent? How many messages translated,
> how many projects? What is the track record?

For legacy formats in the past, my experiences were poor. I used to work
in localisation engineering before working purely on tools, and we found
a lot of the time that translators would translate things they weren't
meant to translate, screw up the po file format, get printf escapes
wrong (causing dragons, hair loss and core dumps), screw up the codeset
of files being set out for translation, occasionally get versions
confused and a myriad of other nasty things.

XLIFF at least is easier to validate some of these things and provides
structure where there was none before. It doesn't solve all the
problems, but it can help.

So far, my experiences on XLIFF have been nothing but positive. Since
it's a relatively new standard, there aren't many products out there
that do it, and my guess is that none of them are open source.

We're using XLIFF for all our html, plaintext and sgml translation -
we're adding xml, software message format, jsp and possibly open-office
support at the moment. Lots of messages, lots of projects.

> Sounds like what we have for GTP and also for kde?
> msgfmt validates .po files

and someone has to then fix the mistakes

> Text to be left untranslated is simply not marked for translation.

how about certain strings mid-message that shouldn't be translated ?
There isn't really a way of protecting these, apart from a
human-readable comment above saying "don't translate bla".

> Fuzzy match is standard in msgmerge.

can it match against a database of a few million source segments, or
just against one other file ?

> glossary and machine translation can be done with compendia and kbabel.

cool - that rocks

> kbabel marks which tools that are used.

excellent

> > To do all this, you end up mangling the po format in an ungodly way to
> > make it work and it's not easy to do (particularly the marking certain
> > sections as being non translatable) - I tried it once and it was ugly.
> 
> How long ago was your attempt?

A year or two ago I guess, I don't think anything has emerged in the po
file format that solves the problem though.

> It seems like you are referencing some old version of .po-files and
> associated tools.

Well, I can't change the version of gettext that ships with Solaris, so
I'm stuck with what we have. I still believe that po files aren't the
best solution.

> One more thing I was interested in, is QA: what to do when the initial
> translation is done, to ensure correctness etc. Has Sun got any tools in
> that area?

well, apart from our translation editor which is used by reviewers to
mark translations as "good" or "needs review", we don't have any special
tools. Using translation memory on a large scale, as I mention in (c)
above helps to ensure consistency - which I guess is difficult to do in
a more bazaar-like manner, as opposed to our current cathedral
approach.  

	cheers,
			tim

Follow-Ups:
- Re: PO-based Documentation Translation
  - From: Kjartan Maraas
- Re: PO-based Documentation Translation
  - From: Danilo Segan
- (OT) Re: PO-based Documentation Translation
  - From: Reinout van Schouwen

References:
- Re: PO-based Documentation Translation
  - From: Ismael Olea
- Re: PO-based Documentation Translation
  - From: Danilo Segan
- Re: PO-based Documentation Translation
  - From: Ismael Olea
- Re: PO-based Documentation Translation
  - From: Keld Jørn Simonsen
- Re: PO-based Documentation Translation
  - From: Tim Foster
- Re: PO-based Documentation Translation
  - From: Keld Jørn Simonsen

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]