Re: [Gtranslator-devel] Re: gtranslator and OmegaT

From: Fatih Demir <kabalak gtranslator org>
To: gtranslator devel <gtranslator-devel lists sourceforge net>
Cc: mail marcprior de
Subject: Re: [Gtranslator-devel] Re: gtranslator and OmegaT
Date: Tue Jun 10 10:43:02 2003
'Abend Marc, hello to all the others - comments are following inline :-)


On Tue, 2003-06-10 at 13:57, Ross Golder wrote:
> See comments in-line.
> 
> On อ., 2003-06-10 at 10:50, Marc Prior wrote:
> > Can you perhaps answer any of the following:
> > 
> > - Do rigid procedures exist for translation of GNOME documentation, or can 
> > translators simply take files in the relevant format (presumably DocBook) and 
> > use any desired application (and, for that matter, platform) to translate 
> > them provided they are returned in the same format)?


The translators are totaly free in their application, platform choice,
me can speak for the Turkish team and we have got some pure Windows
users among us who are translating gettext files via pure text editors
under Windows - free application + application choice in reality I'd say
;-) The only important thing is that the po files are rightly encoded
and formatted (no syntax errors inside).


> > - Are the translation projects managed centrally (in which case who is 
> > responsible for them?), or are the different language teams left largely to 
> > their own devices?
> 
> These questions would be better put to the gnome-i18n gnome org mailing
> list, which is where all the language translation projects are
> 'co-ordinated' (centrally managed ;). I think your assumptions are
> fairly close to the truth.


Every language team is really free in how they are managed, done etc.
the only main coordination takes place in gnome-i18n gnome org or on
local lists like gnome-turk gnome org for Turkish etc... There must be a
language leader for every language, that's it, no more is necessary.


> > - From your message, I infer that gtranslator is used exclusively for .po 
> > files. Are there any plans to adopt it for the translation of documentation, 
> > as well as application interfaces? (The KDE teams, I believe, also use kbabel 
> > for documentation.)
> 
> I don't personally have any plans in this area. It's specifically a tool
> for editing '.po' files, and due to it's design, it can also deal with
> other formats designed to store translation information (e.g. UMTF,
> TMX). If you want to maintain a completely different class of file, such
> as Docbook or XML, you should use a completely different type of
> application.


Well, previously I had the dream to get different formats and
translation kinds working by providing plugins for gtranslator and such
but that makes way too much complications and gtranslator's real using
area should be in translating the gettext po files as it's quite
specialized on this file type and I think we should extend the po file
editing and such capalities to it's maximum rather then playing in other
waters... /me just wasted time, code lines & efforts with my plugins
idea, docbook conversion etc... it's not worth the trick really.


> > - From the gtranslator web pages, I found "compendia", which I presume are 
> > translation memory repositories, for Turkish and Lithuanian. Are these the 
> > only languages for which anything of this kind exists?
> 
> I haven't come across these yet, so I can't really comment.


But me :-) It's a historically relict and was a previous kind of
autotranslation technique for gtranslator which used greater po files as
compendias - currently this is murks and nothing important, gtranslator
now uses learned messages and files for it's own UMTF translation
memory/buffer.

The translation memories (learn buffers) for Turkish and Spanish are
available via www.gtranslator.org to be correct on this point.


> I would anticipate that eventually, each GNOME language project will maintain
> a 'master glossary' or something, which would form the basis of a
> translator's learn buffer. Much of the po files would be
> auto-translated, then the translator could go over the messages
> hand-editting any that were badly guessed, using the glossary in some
> way as a reference (e.g. when a word has multiple potential translations).


Which is currently the case as you can obtain the bigger GNOME
glossaries via the CVS module "gnome-i18n/glossary/" where many
languages and their bigger glossaries are found - you can learn this
glossaries and some other more important and bigger po files with
gtranslator and via this way you can build a good learn buffer
(gtranslator's TM) which reaches 25%-40% autotranslation :-) Complicated
enough?! Huh... but for this task, there are scripts in the gtranslator
package which helps to build a good learn buffer - it's named
build-gtranslator-learn-buffer.sh.


> > - I notice that gtranslator uses an internal XML format, UMTF, but that TMX 
> > support is planned. Is this still on the cards, and if so, what form will it 
> > take? Will gtranslator be able in future to read external TMX files? Would a 
> > UMTF <-> TMX converter be useful?
> 
> I have briefly touched the UMTF format, and as yet not looked into TMX.


Well, UMTF was a child of the idea to use a common TM format among
KBabel and gtranslator but somehow KBabel kept on his db format. So we -
or better to say here - me kept UMTF as our format for the learn buffer
as it's easy to handle and quite logical. TMX is a bit more extended and
much more trimmed for the professional users' case of course and a UMTF
<-> TMX converter would be useful, the idea to read/save TMX files is
not dead yet and via the "semerkent" sources in gtranslator it's somehow
also quite possible even today but very, very complicated to get working
:-(

A native TMX, XLIFF support via the semerkent sources in gtranslator's
code base would be possible but will need time as I think TMX is an
important target format to support in any way - it would be very perfect
if somehow/someone could get the semerkent files better integrated into
gtranslator and possibly make the learn buffer code redundant to the
format via using generic functions (would be possible and I worked on
that sometimes before but now, time's less :-().


> From what I understand so far, they do essentially the same job, but use
> a slightly different schema. Transformation between the two should
> therefore be fairly trivial, using a couple of XSLT stylesheets, and libxslt.


Would be also a good idea to do :-) Never coped with libxslt yet so I
cannot comment on this really substantial, but that seems to be also
possible quite well.


> > - Does gtranslator employ any form of fuzzy matching?
> 
> Yes, it maintains a 'fuzzy' status for each message in the po file.


It maintains the fuzzy status grep'd by the gettext programs generically
but does not really do fuzzy matching while autotranslating - it has got
the "gtranslator_utils_calculate_similarity" function in the
gtranslator/src/utils.c source code file which uses some voodoo to do a
kind of fuzzy matching defineable.

You can "in a manner" the similarity matching with a percentage setting
in the preferences and it works even in some cases (it's actively used
in the code to be honest). But to calculate similarities betweek words
is a quite hard job and I don't think that any algorithm which does this
job good for the majority of the languages existing on our beloved
planet...


I hope my comments made some points more clear and put some light on
some points - comments very wellcome :-)
References:
- [Gtranslator-devel] Re: gtranslator and OmegaT
  - From: Ross Golder
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]