Re: translation memory

From: Eric Bischoff <ebisch cybercable tm fr>
To: Dan Mueth <d-mueth uchicago edu>
Cc: ldp-discuss lists linuxdoc org, kde-i18n-doc kde org, gnome-i18n gnome org, Kjartan Maraas <kmaraas online no>, "David C. Mason" <dcm redhat com>, Mike Sangrey <mike sojurn lns pa us>
Subject: Re: translation memory
Date: Thu, 05 Oct 2000 16:36:08 +0200
Dan Mueth wrote:
> 
> Someone recently pointed me to "translation memory" programs.  Although
> I've never used them, a few minutes browsing the web convinced me that
> these may be very helpful for some of our free documentation/translation
> groups to use.  Can anybody tell me how useful these are for large open
> documentation groups like GNOME, KDE, or the LDP?

At Caldera, we use translation companies that use translation memories.
I'm afraid I have to moderate your enthusiasm towards TMs (translation
memories) here.

> For people who don't know what translation memory is, I'll give a brief
> description of how I understand it to work:
> 
> It is a software system which keeps a database of what you've translated
> so that if you have to translate the same word, phrase, or sentence in the
> future, it can suggest your old translation.  This improves the speed and
> self-consistancy of the translation.

At KDE we have a tool that does this part of the work on po files. It's
called kbabel.

>  If you have a new version of a
> document which you've already translated, it can use the database to
> translate all the old material the same way as before, just leaving the
> new material for the translator to do by hand.  The systems are generally
> designed for multiple translators to share a common database, so it
> improves consistancy between the various translators.  It works best for
> large systems of documents, for multiple versions of documents, and when
> translation is done by multiple people.  (All of which are properties of
> our large free documentation/translation projects like the LDP, KDE, and
> GNOME.)
> 
> For more information, see:
> http://www.raycomm.com/techwhirl/translationmemory.html

There's another feature you don't mention: when translating an HTML
file, the HTML markup is kept in a separate file which has pointers to
the segments in the TM. This allows to do the automated part of the
"DTP": rebuild the translated HTML file out of the translated segments.
Of course, this mechanism is puzzled sometimes when the order in
sentences changes like in a German translation of an English sentence.
It works the same with .rtf, .doc, and many other (usually
windows-derived) file formats.

Now as promised is the pros and cons I have seen for TMs.

A) applications messages: the mechanism of the .po files replaces them.

In fact there are three characteristics of a translation memory:
- reusability (translate a given "segment" only once)
- updates (do not lose already done translations when the orignal
segments partly change)
- context (do not reuse the same translation when the context of a
segment changes => needs to add context information)

The .po files mechanism addresses reusability and updates through
gettext and msgmerge utilities. It lacks context information, but at KDE
Stephan Kulow added a hack allowing to add context information to msgid
strings.

B) documentation: if someone uses docbook the advantages of TMs are
seriously lowered.

The reusability in documentation is much lower than the reusability in
application messages.

The updates mechanism of DocBook files is poor: you have to work on:
- the diff between the old english and the new english files (assuming
of course we translate FROM English)
- the new English file
- the old translated file
in order to build manually the new translated file. I call this process
"adding the missing 
corner of a square" ;-).

But this poorness is compensated by the fact that you don't have two
very slow and stupid phases induced by TMs at the first translation and
every update:
- the "alignement" phase before you start translating. It consists in
hashing the original text to be translated in small "segments", and
preferably the same ones as you last translated ;-)
- the "DTP" (DeskTop Publishing) phase that means to reconstruct
manually the translated document out of the translated segments. This is
not fully automated because you always get problems like putting the
page boundaries at the right place and fixing references from one page
to another.

Both the "alignement" and the "DTP" disappear if you're using DocBook
because the source files contains indirectly all the formatting
information.

It is also possible to combine translation memories and DocBook to get
the best of both. If you do that, you have to do a strategic choice: is
my markup stored in the translation memory itself or in some external
file?

C) Scope

Another characteristic of a translation memory is that you have *one*
database for all your application and documentation files. On the
contrary, usually there's one po file and one docbook file per
application.

The advantage is that you have more reusability (this would ensure you
use the same string to qualify a button's label both in the application
and the documentation). The disadvantage is that you lose some context
information (what application am I working on?) unless you explicitly
add context information.

>  Are any of these groups
> currently using a translation memory(TM) program?  Does anybody know of
> any TM programs which run on Linux, or even better - of any free TM
> programs or projects?  If there are no free TM programs, would anybody be
> interested in working on one?

I don't know any such free program but I'm not sure it would be a good
idea. As I said above, a TM is more or less functionally equivalent to
po files + docbook documentation.

There are some normalization efforts going on to build non-proprietary
TM exchange formats, based on XML. The main proprietary format is called
"Trados".

To summarize, the biggest problem I see with TMs are the "alignement"
and "DTP" phases before and after the translation.

I hope this helps.

-- 
 Éric Bischoff   -   mailto:ebisch cybercable tm fr
 __________________________________________________
                                           \^o~_.
     .~.                           ______  /( __ )
     /V\         Toys story         \__  \/  (  V
   //   \\                            \__| (__=v
  /(     )\                        |\___/     )
    ^^-^^                           \_____(  )
     Tux                        Konqui     \__=v
 __________________________________________________
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]