Re: Handling Translations

From: Joakim Ziegler <joakim ximian com>
To: Christian Rose <menthos menthos com>
Cc: gnome-web-list gnome org
Subject: Re: Handling Translations
Date: 31 Aug 2001 11:57:43 -0500

On Thu, 2001-08-30 at 18:16, Christian Rose wrote:
> Joakim Ziegler wrote:
>> It seems to me that both these methods are somewhat complex, and, more
>> importantly, slow.

> Do you have any data on that? I doubt the gettext solution introduces
> any user-noticable difference in speed. If in doubt, test it, compare
> it, produce some data on performerance, and then we can have a
> discussion. Speculating isn't helping.

The speed hit from gettext is probably trivial. From reading an XML
file, it's probably a lot bigger. Remember, a whole new file has to be
read into memory, parsed by the reasonably large chunk of code an XML
parser represents, and then you ahve to actually do stuff to merge the
content with the template. This is most definitely a non-trivial speed
hit on a machine that at times (around releases) handles 20 hits per
second at peak.

For gettext, my concern is more with complexity. See below.

>> Now, if we think about the fact that if the template
>> system works as it should, and actual PHP functions are externalized,
>> then there won't be any common elements between the different language
>> pages (apart from headers and footers to call templates.)

>> So why not just use different pages (PHP files) for different languages?

> Because it sucks. It's very much non-maintainable. How do you get
> notification of changes? If random hacker X spots an error on the site
> and commits a fix to cvs, how is translator Y[1-40] to know that spot Z
> on page W changed, without diffing the entire site periodically, and
> manually inspect all diffs, and on top of that try to insert
> corresponding changes in their translated pages? Translators want to
> translate, not spend most of their time trying to track changes.

> gettext/xml-i18n-tools/PO format solves these fundamental problems. We
> have discussed it repeatedly on this list, but you just continue to
> ignore the problem.

Please do not make this out to be me trying to make life difficult for
the translators. Why would you think I want to do that?

Notification of changes could be done in many ways. Trivially, the page
is changed if the datestamp of the master page (most likely the English
one) is newer than the translated pages. There are also more advanced
ways of tracking it, if we want more complexity.

>> I believe it's pretty common to use foobar.en.html and so on.

> And that doesn't mean that we have to do an inferior and very much
> broken solution like that when doing a brand new site. Listen to
> translators for once, and help them help you, instead of ignoring them
> and their plea for the proper translation interface on purpose.

Please stop erecting strawmen like this. Why do you think I'm "ignoring
them and their plea for the proper translation interface on purpose"?

There are some problems I see with gettext, which I think were brought
up before as well. They might not be unsurmountable problems, and if the
translators want to deal with them, that will be ok. But I do not want
to be in a situation where the translators end up with a solution that's
difficult to work with, and we can't change it because it's too late,
and the reason we ended up with this solution was that we didn't discuss
the issues enough.

The main problem can be summarized as such: *The nature of text in
software is very different from the nature of text on webpages*.

Text in software consists of short, relatively independent strings. This
is exactly what gettext is created to manage. When a string changes,
you'll know, and you can translate that string again.

On the other hand, text on webpages is prose. It's long passages of
text, and it's *highly interdependent*.

So when you use gettext for web page text, you need to make a decision.
You can either make each translatable string short (like one paragraph
at the most, maybe less), and it'll be relatively manageable in the
sense of what strings will be reported as changed. However, this will
mean that the translator will lack a lot of context when doing
translations, and given that translation to another language is never a
1:1 process, this can lead to clumsy prose and other problems (for a
simple example, consider the problem of overusing a term or phrase in a
span of text).

The other option is to use longer gettext translatable strings, maybe
the whole body text of the page is one string. This means that there
will definitely be enough context for the translator to create a
high-quality translation, but now there's a different problem: The
string that will be reported as changed is very long, so it can be hard
to see exactly what changed (for instance if a typo was fixed or some
other minor change was done), and also, there will be lots of markup in
the translatable string. Stuff like paragraph breaks, table structures,
etc., etc. And the translator will have to edit between this stuff
without the benefit of having the HTML (or PHP) context of the whole
document, so it'll be very easy to break stuff.

So there are definitely issues with using gettext. I don't understand
why people have to be demonized for pointing them out.

-- 
    Joakim Ziegler - Ximian Engineer - joakim ximian com - Radagast IRC
 FIX sysop - Free Software Coder - Writer - FIDEL & Conglomerate
developer
http://www.avmaria.com/ - http://www.ximian.com/ -
http://www.sinthetic.org/

Follow-Ups:
- Re: Handling Translations
  - From: Jonathan Blandford
- Re: Handling Translations
  - From: Christian Rose

References:
- Handling Translations
  - From: jeichorn
- Re: Handling Translations
  - From: Joakim Ziegler
- Re: Handling Translations
  - From: Christian Rose

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]