Re: PO-based Documentation Translation

From: Tim Foster <Tim Foster Sun COM>
To: Danilo Segan <dsegan gmx net>
Cc: "gnome-i18n gnome org" <gnome-i18n gnome org>
Subject: Re: PO-based Documentation Translation
Date: Thu, 02 Oct 2003 12:34:09 +0100

Hi there,

> Yes, I see your point. It would really help in some cases, and there  
> are major benefits. Still, translation of documentation requires (to  
> me) a bit more freedom than just sentence-for-sentence (yes, I may  
> choose to translate one sentence to two, or one to a 'blank').

Yes, this is where things get complex - you're talking about 1:m n:m or
m:1 matching, where one source sentence corresponds to many target
sentences, many source sentences correspond to many target sentences or
many source sentences correspond to 1 target sentence.

This is really hard - and is something that we're working on right now.
I believe other translation tools have solved this problem, and our
system is getting there (we have some support for it in the editor, but
I'm not sure it's complete yet)

Again, I'd stress the advantages to sentence level segmentation from an
ease-of-automation point of view : doing fuzzy searches on paragraphs is
more computationally intensive, and is less likely to achieve high
leverage unless you're really storing sentences behind the scenes and
are jumping through hoops to convince translators that they're
translating paragraphs !


> I don't think I want to make use of TM and fuzzy matching in this  
> case :-)

Funny you should mention that - we're working on such a solution. (it's
quite complex though)

One of the main problems with uptake of translation memory that we've
encountered internally, and is a major thorn in the side of TM systems
is where translators insist on rewriting documents. Small stylistic
changes can result in the translation (t) not really being a good match
for sentence (b) except in the context of other sentences (a) and (c).

This is really something to watch out for - that translations are really
good quality translations, and not a rewriting of the original sentence,
perhaps with additional explanation for the target language audience.

This sort of brings us into another realm of translation technology that
noone's mentioned before - translatability & controlled language : where
the writers of the source document attempt to write to a set of rules to
make translation easier, you could think of it as i18n guidelines for
tech-writers. 

I have to say, I'm not an expert in this field, but it might be worth
looking into if you haven't already...

> Certainly, I believe it would work great in most cases, but there is  
> simply a small number of cases where it would not work at all. And  
> that's what bothers me. Yes, benefits are huge, but what to do when you  
> get stuck in one of those situations?

Well, so far, we haven't really found it to be a problem : at the end of
the day, when the file has been converted back to it's source format,
you can always post-edit it : nothing's preventing you from doing that.
My feeling is that this doesn't come up in technical documentation as
much as it could in a more prosaic type of text, but I do understand the
problem and we are working on it.

> Still, that means that I'm all for new tools, new technologies, and  
> doing the job better. But until I try them, it's all theoretical and we  
> may discuss it at length, surely.

Yep - I suppose I'm at an advantage here, because these systems are't
theoretical to me, I've written them ! But I appreciate and welcome some
skepticism : thus far, I've only been able to give advice on stuff, and
would like to give source code :-)

> Theoretically, at least things like printf format specifiers should  
> also be marked "untranslatable", but it requires support in an editor,  
> of course.

Yes, that's our intent : where possible, to mark special types of
things, - inline tags in html/xml/sgml, %s-type printf formatters in po
files, {0} markers in java MessageFormat strings and other types of
special formatting.

>  I just cannot conceive how one would do that except for  
> splitting the translation into 'fields/objects' that can be moved  
> around -- yet, this approach has many drawbacks on speed (how does one  
> move them? using a mouse? that would be real slow; using keyboard?)

That's basically it : an untranslatable section is highlighted, and
moves around as an atomic chunk.

> > Yes : nothing's impossible ! (What an optimist I am :-)
> > 
> > The question to ask is, how good is what you're using, and if you're
> > going to change, what level of change do you want to accept ?
> 
> As a translator, I'm willing to accept even the biggest of changes. Of  
> course, precondition is that I lose nothing (plural-forms, fuzzy  
> matching, etc). I can cope with tools change (though it would be bad  
> for my Emacs-following religion :-), process change and format change.

Cool - like I say, for us, XLIFF is a container format - so none of the
intelligence of the source file format goes away, just extra useful
layers are added on top of it, to allow the translator to treat all
formats in a similar way, and prevent the poor localisation engineer
from hacking all sorts of tools together to deal with multiple formats.


> Still, I think there's a bit more of a problem -- how to make all the  
> developers switch to using new stuff? It is hard enough to make them  
> use ngettext properly, which is a really trivial change ;-)

Aah, yes - developers have a hard enough time already with the demands
of their international audience (what, I can't use 8-bit characters
anymore ?!) : since the underlying i18n mechanism doesn't change,
there's nothing new for them to learn - they continue to output the
formats they're used to and we just wrap those formats into XLIFF for
the duration of the localisation process.

	cheers,
			tim

Follow-Ups:
- Re: PO-based Documentation Translation
  - From: Gudmund Areskoug

References:
- Re: PO-based Documentation Translation
  - From: Ismael Olea
- Re: PO-based Documentation Translation
  - From: Danilo Segan
- Re: PO-based Documentation Translation
  - From: Tim Foster
- Re: PO-based Documentation Translation
  - From: Danilo Segan

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]