Re: PO-based Documentation Translation

From: Christian Rose <menthos gnome org>
To: Alexander Kirillov <kirillov math sunysb edu>
Cc: GNOME I18N List <gnome-i18n gnome org>,GNOME Documentation List <gnome-doc-list gnome org>
Subject: Re: PO-based Documentation Translation
Date: 26 Sep 2003 02:05:40 +0200

fre 2003-09-26 klockan 00.19 skrev Alexander Kirillov:
> This has been discussed - and no consensus was reached.

Really? Perhaps not in the sense "this is the tool we should use", but I
don't see many people opposing the idea of using PO format in general.
I've reread the entire
http://mail.gnome.org/archives/gnome-i18n/2003-January/msg00413.html
thread (from the last time this neverending topic was discussed) now,
and I can't see that there was much opposition about the PO format in
general.

(For anyone who hasn't read the old thread or doesn't remember it,
please do read it. Perhaps then we can avoid reiterating the same issues
over and over)


> I personally do
> not feel that PO format is suitable for docs. PO files deal with a
> collection of clearly defined units (messages), with no relation between
> them.

That's not entirely true. Regular PO file messages usually relate to
each other in rather interesting ways. If they wouldn't, we wouldn't
need to bother with consistency at all in translations, and the use of
fuzzymatching would be very limited. But that's clearly not the case.


> Docs do not look like that, and I do not know how to present a
> document as a series of such units. For example, taking each paragraph
> as a "message" can sometimes cause trouble: in my translating
> experience, I sometimes felt the need to break a paragraph in two, so
> there is no 1-1 correspondence between paragraphs of original doc and
> paragraphs of translation.

This was all discussed last time around. There's no reason a tool
couldn't treat the "\n" sequence in this example as an extra
</para><para>:

"foo\n"
"\n"
"bla"

Perhaps not perfect, but it seems most of these suggested problems are
totally solvable.


> There are other issues, too. So I found that
> updating translations using good old diff mode of emacs was easier. 

Your experiences is obviously clearly different from mine. I've been
maintaing large docs that changed so fast I couldn't keep up. Not the
contents though -- but the authors were undecided about indentation,
fixing typos and the ordering of the chapters, and hence my attempts to
keep up with diffing the original doc to look for changes to implement
the same changes were doomed to fail.
Also, they added paragraps all the time that was largely similar (help
texts with "Use this for that" with just a few words replaced), but of
course I could only dream of fuzzy-matching when editing raw XML when I
had to manually remember, search for, and find the old translations and
manually re-use them wherever possible.

I was about to give up -- but then they allowed for po-based translation
of the same document. Suddenly the thing that had previously taken me a
week to do, updating my translation to match the current state of the
master doc (with lots of diffing having to ignore irrelevant indentation
changes and reordering, scrolling and manual copy-pasting), only took a
day. Also, suddenly I didn't need to keep track of (mostly irrelevant)
changes in the whole master doc, but I could just watch for fuzzy or
untranslated messages appearing in the po file.

Trust me, there's a *lot* of difference in work in just having to run
"msgfmt -cv" to get the number of untranslated and fuzzy messages, and
in having to diff an entire XML file and carefully visually compare it
with another translated XML file by hand.

There's an enormous difference in work required and time needed. To give
an analogy, it's like comparing counting the number of lines in a large
file by hand, versus just having to run "wc -l" on it.


> This, of course, is a matter of choice: if most translators feel that PO
> format is what they need, we should start thinking about moving to it.
> But I see many problems with this, and not enough benefits. 

I only see benefits, given that documentation purists who love editing
XML by hand can in any case continue to do that if they want to.

Using po format for documentation translation is purely logical from a
lot of undisputed reasons:

* The absolute majority of the current translators are very familiar
with, and used to, po-based tools
* There exists a large set of tools to work with po files
* Translators depend upon quite a few necessary features that are
present with po-based tools, such as fuzzy matching and automatic
translation re-use
* Translators are *not* interested in indentation changes or chapter
re-ordering or any other XML document changes irrelevant to the actual
message content
* We have lots of translations for the applications, but very few for
documentation
* Modules that appear on the GNOME translation status (and thus are
exposed to translators who can easily fetch and translate them) soon get
dozens of translations in just a few weeks, while they can stay
untranslated for years when they're not on the status pages (this
happens all the time)
* The translation status pages are po based, and very much based on the
fact that the current translation work done and the remaining work
needed in the files can be automatically calculated by a tool

and last, but not least:

* We want to have more translated documentation

I don't see how anyone can come up with any other conclusion when
summing up the points above.


Christian

Follow-Ups:
- Re: PO-based Documentation Translation
  - From: Alexander Kirillov

References:
- Re: Weekly i18n status update
  - From: Reinout van Schouwen
- PO-based Documentation Translation (wa Re: Weekly i18n statusupdate)
  - From: Christian Rose
- Re: PO-based Documentation Translation (wa Re: Weekly i18n statusupdate)
  - From: Alexander Kirillov

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]