Re: PO-based Documentation Translation

From: Tim Foster <Tim Foster Sun COM>
To: Gudmund Areskoug <fta algonet se>
Cc: gnome-i18n gnome org
Subject: Re: PO-based Documentation Translation
Date: Thu, 02 Oct 2003 12:04:53 +0100

Hello,

On Wed, 2003-10-01 at 19:15, Gudmund Areskoug wrote:
> Hi,
> 
> Tim Foster wrote:
> > In terms of saving time and money, sentence segmentation is fantastic
> > and works for us.
> 
> just thinking out loud: does fine granularity segmentation have to 
> be reflected in the file format (po/xliff), to be put to use?
>
>
> Couldn't that part just as well take place in 
> msgmerge/pretranslation and inline DB/compendia queries during the 
> translation process?

Absolutely, if I understand your point, the problem is that it's more
work to get the stuff rolled into the database after translation : you
need to do alignment of the translated paragraphs, and then manual
checking of those alignments before you can use the results.

I'll explain further -

suppose we have our 4 sentence paragraph :

msgid "a,b,c,d"

It sounds like you're suggesting that we split these before doing a
lookup, and then merge them again - no problem, we separate out the
sentences and matches for a,b,c and d

giving us, perhaps the translated message file :

msgid "a,b,c,d"
msgstr "e,f,g"

- now, the trick is, in order to repopulate our database with the
translated messages, we need to find out which sentence in the source
matches which sentence in the target. Now, there's algorithms out there
[1] that can help with this, but most of the time they need to be
checked to make sure they produce the right output.

a,b --> e [two source sentences matching one translated sentence !]
c --> f
d --> g

In fact, you can use the same alignment techniques to take two
translated documents, and try to construct a translation memory from
them, but again, this usually requires an expensive linguistic review
after the alignment algorithm has done it's thing to ensure it hasn't
made mistakes.

> I can see that reflecting segmentation et al. when one translator 
> gets to continue somebody elses work makes sense, and possibly in 
> some other situations, but are those situations relevant here (I 
> don't know)?

<shrug/> All I can say, is that this has worked for us in the past, and
the translators out there that we use are happy enough with the state of
affairs.

> What I think might enhance productivity and QA, would be increased 
> use of hirarchically marking up terminology (in DB's and compendia), 
> and increased use of (clever) assembling of the portions where match 
> differences were found.

Yep, that makes sense. We don't have terminology integrated in our XLIFF
tools yet, but that's certainly something we'd like to have : you could
imagine the situation where a particular phrase should be translated in
a particular way : it would be nice to have some method of searching
inside segments for phrases that have been translated before, and at
least suggest a particular translation for that phrase, if we've got it
in a database somewhere.

> There's probably some good reason for having segmentation reflected 
> in the files, so can you please elaborate on it Tim? Tool 
> compatibility/portability?

Well, it's mainly the problem of getting the text back into the database
at the end of the process without having to do expensive alignment.

> BTW: I know Trados has a large commercial market segment, but IMHO 
> it's not a tool in particular where too much could be learnt 
> (granted it's been a while since I tried it and dumped it).

Some of the concepts that Trados uses (as far as I can remember) like
fuzzy matching, terminology matching and sentence level segmentation,
are quite straightforward. Other things like 1:many many:1 or many:many
segment matches are much harder to get your head around, but which
translators usually end up asking for...

> Heretic thought: if there are tools for it, why not let the 
> translator/translation team use whichever format they want?

Absolutely. We tend to blast every format we can from it's original
format to XLIFF for ease of processing during the l10n process and then
back to the original format before complilation and release engineering
into the final product. There's no problem though with continuing to
deal with po if that's what you prefer.

I would say though (and this was the original discussion of this thread)
that I don't think that po format is well suited to the task of
documentation translation - that's all...

> I guess there may be elements in xliff that can be incorporated in 
> po, but perhaps the same holds true the other way around: I'd be 
> interested to see where in the xliff files you stored the context 
> normally found in a po file. And how is plural handling solved? Do 
> you have a sample (e. g. some part of a UI po + corresponding xliff)?

Let me work on that over the next few days and see what I can come up
with - I haven't got the po->xliff stuff finished yet. Right now, I just
have sgml/xml/html -> xliff working.

Wrt. plural handling, again, xliff is really just a container format,
and relies on the smarts of the source i18n framework (gettext in this
case) to do that.

	cheers,
			tim

[1] Church & Gale (1993)
http://www.research.att.com/~kwc/publications.html

- there's a load of other stuff in this space : this was just an
oft-quoted example.

References:
- Re: PO-based Documentation Translation
  - From: Ismael Olea
- Re: PO-based Documentation Translation
  - From: Danilo Segan
- Re: PO-based Documentation Translation
  - From: Tim Foster
- Re: PO-based Documentation Translation
  - From: Karl Eichwalder
- Re: PO-based Documentation Translation
  - From: Tim Foster
- Re: PO-based Documentation Translation
  - From: Gudmund Areskoug

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]