Re: PO-based Documentation Translation



ons, 01.10.2003 kl. 17.16 skrev Tim Foster:
> Hi Keld,
> 
> On Wed, 2003-10-01 at 15:37, Keld Jørn Simonsen wrote:
> > On Wed, Oct 01, 2003 at 02:44:16PM +0100, Tim Foster wrote:
> [snip]
> >  What's your point ?
> > 
> > My point is that GTP is the "real world" - while Sun systems
> > are niche products at least in the workstation market AFAICT
> 
> Right, whatever...
> 
> >  I was just provoked
> > by somebody indicating that GTP was not the "real world". GTP performs - 
> > in myriads of languages every day - on tens of millions computers all
> > over the world.
> 
> Not at all, as I've mentioned before, I'm amazed at the number of
> languages GNOME is translated into - you guys are doing an excellent
> job, I'm trying to help out with some new technology that in my view is
> streets ahead of what you're using now. We're not in the translation
> business, so we have no reason to keep this stuff to ourselves.
> 
And I for one am thrilled to see that you're trying to find ways to
contribute existing technology to the project. I'm definitely going to
try out anything you have to see if it can make me more productive.

> > I have also asked for refinement for QA for GTP and asked if some
> > industry prople like Sun could explain us about the techniques they
> > employ.
> 
> I think it would probably boil down to three things for us :
> 
> a) hard cash, to buy -
> b) - trained linguists who randomly review translations on projects
> 
> - if a vendor consistently does bad translation, then we don't use them
> on future projects. In an environment where you pay a company to
> translate software, it's in their best interest to do a good job. I
> guess things are different in open source, so my next point would
> probably be more useful to you 
> 
> we're starting to get now with automatic translation :
> 
> c) a centralised translation memory system that can help provide
> consistency of translation across products.
> 
And of course automatic translation of already translated text, which is
a huge benefit in docs I guess.

> 
> 
> > > Sun, Oracle, Novell, Microsoft, IBM and several translation companies
> > > saw the need for XLIFF and they wouldn't have invested in it if any
> > > pre-existing format was going to cut it.
> > 
> > And what are your experiences? Which products have used these
> > technologies, and to what extent? How many messages translated,
> > how many projects? What is the track record?
> 
> For legacy formats in the past, my experiences were poor. I used to work
> in localisation engineering before working purely on tools, and we found
> a lot of the time that translators would translate things they weren't
> meant to translate, screw up the po file format, get printf escapes
> wrong (causing dragons, hair loss and core dumps), screw up the codeset
> of files being set out for translation, occasionally get versions
> confused and a myriad of other nasty things.
> 
> XLIFF at least is easier to validate some of these things and provides
> structure where there was none before. It doesn't solve all the
> problems, but it can help.
> 
> So far, my experiences on XLIFF have been nothing but positive. Since
> it's a relatively new standard, there aren't many products out there
> that do it, and my guess is that none of them are open source.
> 
What is relatively new to you? I mean the po format isn't really that
old itself and I think we can't really say it has been thoroughly put to
the test until GNOME and KDE took off which would be 3-4 years ago -
maybe a bit more.

> We're using XLIFF for all our html, plaintext and sgml translation -
> we're adding xml, software message format, jsp and possibly open-office
> support at the moment. Lots of messages, lots of projects.
> 
> > Sounds like what we have for GTP and also for kde?
> > msgfmt validates .po files
> 
> and someone has to then fix the mistakes
> 
> > Text to be left untranslated is simply not marked for translation.
> 
> how about certain strings mid-message that shouldn't be translated ?
> There isn't really a way of protecting these, apart from a
> human-readable comment above saying "don't translate bla".
> 
> > Fuzzy match is standard in msgmerge.
> 
> can it match against a database of a few million source segments, or
> just against one other file ?
> 
> > glossary and machine translation can be done with compendia and kbabel.
> 
> cool - that rocks
> 
> > kbabel marks which tools that are used.
> 
> excellent
> 
> > > To do all this, you end up mangling the po format in an ungodly way to
> > > make it work and it's not easy to do (particularly the marking certain
> > > sections as being non translatable) - I tried it once and it was ugly.
> > 
> > How long ago was your attempt?
> 
> A year or two ago I guess, I don't think anything has emerged in the po
> file format that solves the problem though.
> 
> > It seems like you are referencing some old version of .po-files and
> > associated tools.
> 
> Well, I can't change the version of gettext that ships with Solaris, so
> I'm stuck with what we have. I still believe that po files aren't the
> best solution.
> 
> > One more thing I was interested in, is QA: what to do when the initial
> > translation is done, to ensure correctness etc. Has Sun got any tools in
> > that area?
> 
> well, apart from our translation editor which is used by reviewers to
> mark translations as "good" or "needs review", we don't have any special
> tools. Using translation memory on a large scale, as I mention in (c)
> above helps to ensure consistency - which I guess is difficult to do in
> a more bazaar-like manner, as opposed to our current cathedral
> approach.  
> 

I think it's important for us to be open to any new
tools/formats/standards in this area that has the potential to help us
be more productive or interoperable. I really hope people will take the
time to learn about this even though they're perfectly happy with the
current setup. One advantage I can see is proprietary software going
open source. It took a *long* time before translating OO.o was feasible
just because it happened to not use gettext, and if we use something
that's more of a standard in that space we would lower the threshold in
that respect. Just a thought.

Looking forward to seeing good stuff here :-)

Cheers
Kjartan




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]