Re: PO-based Documentation Translation



Hi Folks,

On Wed, 2003-10-01 at 11:41, Danilo Segan wrote:
> Yes, I may be ready to do that if that would guarantee a major gain in  
> productivity for me in the future. Still, I'd like some more convincing  
> that it will be the case ;-)

Okay - here goes !

To be honest, the main problem that there is when translating from
docbook or any documentation format to .po files isn't the format that
the translator sees, though again, I'd recommend XLIFF, it's the level
of segmentation you do on the input text.

Given any document, in order to maximise the reuse of translations,
you'd ideally segment text at the smallest level possible. For example,
given the paragraph :

<para>A <firstterm>launcher</firstterm> starts a particular application,
executes a command, or opens a file. The calculator icon in <xref
linkend="gosoverview-FIG-28"/> is a launcher for the
<application>Calculator</application> application. A launcher can reside
in a panel or in a menu. Click on the launcher to perform the action
that is associated with the launcher.</para>

(I'll label the sentences a,b,c,d for convenience)

it's easiest to extract the entire paragraph as a block from the
docbook, and write that as a single message in a po file, thus :

msgid "a,b,c,d"
msgstr "a,b,c,d"


However, doing it this way means that if a single sentence changes, the
entire paragraph gets marked as a "new" message (probably a strong fuzzy
match) in an automatic translation system.

The better approach, is to segment the text at a sentence level : this
can be very hard[1], but is ultimately worth it. That would give you the
messages :

msgid "a"
.
.
.
msgid "d"
msgstr "d"

- with a finer level of message granularity, you can protect yourself of
from the effects of a single sentence changing in a paragraph.

In fact, there's another bonus too, since if the sentence "A launcher
can reside in a panel or a menu" appears in another document, but not in
exactly the same paragaph context, you still have the translation in
your database and can reuse that.

How's that for a major gain in productivity !?

(It's worked very well for us)


Of course, as I say, this has nothing to do with XLIFF per se, just that
we've already got this sort of solution up and running that happens to
use XLIFF as it's target format, it would seem easy just to pick it up
and run with it. (if I ever manage to get it released)

> OTOH, I don't think any of the translators that are helping me (in  
> Serbian team) are willing to invest that much time. That means that  
> I'll be the only one to benefit from it, and it doesn't mean much.

Yes, you're dead right - it will take work to learn a new procedure, and
a change in tools : just need to ask yourself, how efficient is the
current process, and what are the barriers to entry. If people aren't
doing GDP translations because CVS is hard to use for the novice user,
or they don't understand Docbook or po format, then these are certainly
areas I'd address.


> So, what's the price to switch, and what's the price to entry?

The price to switch is me getting this stuff made available, and the
price to entry is either learning a new tool, or coming up with modes
for existing editors to support the format. (Personally, I really
wouldn't want to hand edit XLIFF files)

> If we're to convert between XLIFF and PO, and edit PO files thus  
> obtained, I don't see a point in using XLIFF at all -- that would lose  
> all the advantages XLIFF might have over PO (unless you hand edit them  
> later, but than what's the PO file for?).

I agree - having XLIFF files and converting them to po doesn't really
make sense. However, the inverse does make sense - since po is just
another software file format (in Sun, we use 4 or 5 different software
message file formats)

> Btw, if we're to switch to any new format, I'd really like it to  
> support a bit more of stuff I care about: like translations depending  
> on the gender, context markers etc.

Our editor doesn't really do gender, but can cope with contexts : we can
tell the user from what project a fuzzy or exact match came from, we can
show the user any comments from the original source file, that sort of
thing. We can also do things like list the "approval" status of a
translation. There's bits of XLIFF we haven't yet taken advantage of,
such as using special markup for abbreviations, formulas, that sort of
thing. (though already, we mark can some text as "untranslatable" - our
editor can then protect those fields from editing)


> Currently, from all the stuff that are programmaticaly possible, I  
> think the context markers are the only thing missing in gettext PO  
> format.

Yes : nothing's impossible ! (What an optimist I am :-) 

The question to ask is, how good is what you're using, and if you're
going to change, what level of change do you want to accept ?


	cheers,
			tim


[1] Consider the html paragraph :

This is a piece of <b>bold text. This is a new sentence that</b> isn't
in bold any more.

Sentence-segmenting this paragraph isn't trivial : done properly, you
should get the segments :

This is a piece of <b>bold text.</b>
<b>This is a new sentence that</b> isn't in bold any more.






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]