Re: Mistakes in doc translations



> [: Shaun McCance :]
> I'm pretty sure that distinction belongs to PNG files. :)

Unless you consider them entirely outside of the realm of Git :)

> And if there aren't amazing PO diff tools out there, somebody needs to
> write one.

I'm getting to your points below, but I must again remind of the content-
related difficulty: most of n * m existing PO domain-active translators
combinations are possible; if two translators update same PO file, many
*real* conflicts (those not stemming from crummy diffing) are inevitable;
translation update window is significantly smaller than for other content
(freeze to release). So, even with perfect diffing-merging tools available,
I would assume translation teams would anyway stick to locking workflows as
they use now.

> [...] The reason is that, if you have a 800-character line, and you change
> a letter, it's really hard to figure out what's going on in the diff.

(But most line-level diffing tools handle this well. Even with Git alone, I
find git diff --color-words | less -R quite OK.)

> So here's a summary of the things that cause problems, as I understand it.
> Correct me if I'm wrong.
>
> 1) Automatic rewrapping creates lots of noise and can confuse merges.
> [...]
> 3) Messages get reordered, which creates complete noise in diffs.
> [...]
> (1) and (3) totally suck, and I think reflect problems in the tools
> translators use. It's like the tools are actively trying to prevent you
> from having meaningful version control.
> [...]
> Do we really not have any tools to help translators merge two sets of
> changes? That seems like it should be a solved problem.

Actually, it's the other way around: tools must actively work to support
version control. If they are created without this as one design intent,
their output will normally clash with version control.

For PO files, the only tool box I know of which does this is the one that I
maintain, Pology. At the very core, of PO catalog abstraction, it tracks
message changes and keeps line-level changes in output at the minimum, for
the sake of less noise in version control (tradeoff: performance). For
example, on a search-replace operation, only modified msgstr fields will be
rewrapped, and nothing else in affected or any other messages.

Pology also contains PO diff and patch tools. The diff tool ignores all non-
real differences, including wrapping, source reference changes, reordering.
It can diff two PO files, but also directly in Git: between local
modification and last commit, or between any two commits. It can recursively
diff directories of PO files, again Git aware if needed. The output it
produces is a valid PO file in its own, so e.g. existing PO syntax
highlighting works and needs to be just a bit updated for diff segments.
Here are screenshots of diff in previous fields so that translator can
easier see what to change in translation, and a recursive diff of two PO
directories, seen in Gedit:

http://nedohodnik.net/misc/gedit-po-syntax-ediff-01.png

http://nedohodnik.net/misc/Gedit_Dir_Ediff_PO.png

This diff can be applied as patch, again working around non-real
differences, and handling true conflicts in a way that allows easiest manual
arbitrage (i.e. no plain "hunk rejected").

So what is the use of these tools so far? Almost none. In fact, even I
myself rely on patching very rarely, on special occasions, and not in any
workflow capacity. Granted, a GUI tool which does a lot of hand-holding
would likely make things smoother; but I suspect that primary reason is that
translators are just not interested, due to established locking workflows.
(E.g. when I sent patches for Gedit syntax highlighting (actually
GtkSourceView) to Gedit PO-mode maintainer and one PO-mode user, for
comments, I got no response.)

> 2) Multiple people merging the POT file creates conflicts.
> [...]
> I'd be surprised if git couldn't manage to deal with (2), given that it
> ought to be the exact same changes introduced. For my part, I can
> guarantee that I'll never commit those kinds of changes.

If specialized diffing/patching is available, I'd say this reduces to the
next problem:

> 4) Due to workflow, we don't have a baseline commit to reference.
> [...]
> (4) is a serious problem. git is really smart, and has a number of merge
> strategies that I can only describe as "magic". But they don't work if you
> bypass version control.

For this I have no practical idea how to solve. Other than locking workflow
being the norm, translators are frequently pointed to web-based solutions
instead of version control (so that they don't get scared off by the
process). Then, it is not unusual for regular translators not to have commit
access, which would extremely odd for regular programmers (or documenters,
right?). But, for version control to really work, all regular contributors
should be committers, and should know the VCS well enough to be able to
diff, patch and merge (including PO-specialized variants of those
operations).

> Anyway, if there are problems that break the build (less common with
> recent itstool improvements), then my choices are to fix them or to
> disable the translation. I'm not in the habit of building daily, so if I
> have to disable a translation, it probably means it's disabled in the
> release. And that would be a real shame.

The other way to look at this problem (if the answer to my initial question
is "no") is how to efficiently find the problems and notify translators
about them, so that necessity for fixing/disabling from maintainer's side is
minimized.

For large projects such as Gnome, I thought it would be feasible to have and
maintain a project-specific verification and notification tool. It would
know types of PO files in the project (e.g. that one is for C code, that one
is for Mallard doc) and check all technical issues one can think of, above
msgfmt -c, down to particular messages in particular PO files which have
special constraints on translation.

So, within Pology, I made a couple of project-specific checkers, one of
which is for KDE translations. For example, it knows which PO files are
extracted from Docbook, and checks not only XML well-formedness in
translation, but also whether element names are known (cannot do full
validation for obvious reason), so that the translator is somewhat covered
even if he improves a bit on markup of the original. It can be run
standalone from Pology installation, and it is also run on servers to
produce such a weekly report:

http://l10n.kde.org/check-kde-tp-results/trunk/

Here's the punchline: consider the amount of errors reported here, together
with the fact that the amount of false positives is exactly zero (hard
requirement on the checker).

-- 
Chusslove Illich (Часлав Илић)

Attachment: signature.asc
Description: This is a digitally signed message part.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]