Re: Mistakes in doc translations



> [: Shaun McCance :]
> The answer is plainly yes, if you use version control correctly. PO files
> might have some characteristics that make some things harder, but they're
> not so special that they're outside the realm of git.

But PO files are the furthest outwards in the realm of Git (version control
in general). I'm looking for ways to close them in.

> PO files are more line-oriented than XML files. Will you get diff noise
> from rewraps? Sure.

Documentation XML files may be slightly more special than program code, but
for the single reason you mention, text wrapping. And I've heard that
powerful diff tools that can work around it (Emacs I think). Also, I
personally never word-wrap text in XML files, so in my uses XML files are
exactly same as source code.

Wrapping in PO files causes much more noise because most translators use
dedicated PO editors, which usually rewrap all messages when saving a PO
file; there can be almost total line-level diff for one actual message
changed. Then, there are unfuzzied messages, where half a message becomes a
diff, even if even one word was changed. There are source reference
comments, which change in all subsequent messages when source lines in front
are moved. There is ordering of messages, which can change either due to
source perturbations or messages being obsoleted and shifted to end.

Here is a typical scenario. Translator works for some time on a PO file
obtained from somewhere (from repository incl. intltool-update, from DL),
and completes the translation. Some time afterwards, that PO file is
received by the committer (through email, through DL). The received PO file
is now arbitrarily different from the PO file in the repository, with the
baseline unknown. What is the committer supposed to do? If he doesn't want
to review the translation, he will just copy the received PO over the
current repository PO, run intltool-update and msgfmt -c, and commit. Here
maintainer's fix will be lost outright. If the committer does want to
review, he may run intltool-update over repository PO and over received PO
and diff that (or something to that effect, e. g. rely on DL). Here, given a
lot of garbage in line-level diff, it will require good concentration not to
miss maintainer's fix -- how many committers do this regularly?

With code (or documentation XML) the diff is much more meaningful and the
baseline is normally known, so the version control system (or a standalone
tool) can perform an effective 3-way merge and automatically bring up the
real conflicts. Something in the spirit of this would be needed for truly
non-locking PO workflow. But it would not be sufficient on its own:

> About a dozen people regularly commit to the same Mallard page files in
> gnome-user-docs. Not a single one of the files belongs to only one person.
> I regularly commit to files written by someone else. It does work, as long
> as you use version control correctly.

What is the difference between what is done for Mallard page files and what
programmers do with the code? By looking through Git log, I dont's see any.
For PO files, it goes like this.

By far the most frequent modification to PO files is translation update
after merging. This update will usually happen sometime near to release. For
n PO files and m active translators, most of the n * m file-translator
combinations are viable. If two translators update the same file at the same
time, there will be a lot of conflicts. These conflicts will be such that
one translator's work will simply have to be discarded. The net result is
that translators practically never rely on version control for work
synchronization, but almost always establish some sort of locking workflow
on the organizational level. This can be informal, e.g. through "who will
now update what" on a mailing list, or more formal, e.g. through web
assignment interfaces (like DL's reservations).

With code it is much rearer that two same persons will work on the same code
at the same time. They may work on the same file, but at different parts of
it. For n source files and m active programmers, only a small subset of
n * m file-programmer combinations is viable. The result is that clean
merges are possible most of the time, very little work is lost due to
overlapping, and hence version control can be relied upon for work
synchronization. Organizational locking is extremely rare.

-- 
Chusslove Illich (Часлав Илић)

Attachment: signature.asc
Description: This is a digitally signed message part.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]