Re: Handling Translations

From: Christian Rose <menthos menthos com>
To: Joakim Ziegler <joakim ximian com>
Cc: gnome-web-list gnome org
Subject: Re: Handling Translations
Date: Sat, 01 Sep 2001 00:12:58 +0200
First, sorry for the rants. I'm just getting annoyed by this ever
recurring topic, but now it seems we are slowly getting somewhere.


Joakim Ziegler wrote:
> > <RANT>
> > I'm not saying that you are purposely trying to make life difficult for
> > translators. It's just that everytime we discuss translations, your
> > attitude that you know translation better than translators annoys me a
> > lot. I keep spending my time on explaining why po format is needed for
> > translators over and over, and I'm getting sick of it.
> > </RANT>
> 
> I don't say I know how to translate better than translators (although I
> am multilingual and have done some translation work myself, but not for
> GNOME). However, I do consider myself to be somewhat knowledgable in the
> area of information management systems, which includes translation
> systems. Being a good translator does not make you an expert on the
> technology behind a translation system. As you have pointed out
> yourself, translators are usually not hackers.

s/hackers/coders/
But otherwise I agree with you, if we should do generalizations.


> So even though the
> translators are the end users of a translation system, they might not be
> the best people to design it. This holds true for all areas of software
> design. But I digress. On to the points.

True, but on the other hand, they are the best ones to know what
interface they need to do their work.


> >> Notification of changes could be done in many ways. Trivially, the page
> >> is changed if the datestamp of the master page (most likely the English
> >> one) is newer than the translated pages. There are also more advanced
> >> ways of tracking it, if we want more complexity.
> 
> > And how do you mark what has changed? How do you merge translations, so
> > that the same original word occurring twice, three times, four times,
> > and so on, only needs to be translated once? How do you partially re-use
> > existing translations (and mark them for inspection) when a new text,
> > that is similar to an old already translated one, is added to the pages?
> 
> > Any solution extracting the page strings to po format, which allows use
> > of gettext tools on top of that, solves all those problems. These tools
> > are essential to translators.
> 
> > Gettext support is already built into PHP, so it's really your
> > replacement technology that adds complexity.
> 
> I've also not seen any good examples of sites actually *using* the
> gettext support in PHP (except for that Italian site that was cited
> somewhere in this discussion). That makes the pragmatist in me skeptical
> about how proven this is for use in site translation, both from a
> performance and complexity point of view.

I believe this could be of many reasons:

	1) Many sites are not designed for translation in the first
	place, but translation is merely an afterthought, so the
	"translation system" is often just a hack to try to make it
	fit in with the existing design. In some cases a mediocre
	hack (see Sourceforge)
	2) Lots of web development languages and techniques don't
	have support for gettext (it's very much tied to Unix)
	3) Lots of web developers are not familiar with gettext
	(again, it's very much tied to Unix)
	4) Many web developers, even experienced ones, are not at
	all familiar with any existing procedures for translation
	(again, see Sourceforge)
	5) gettext support in PHP is rather new (PHP4)
	6) Fear of noticable performerance penalties

So the whole truth is probably a mix of all these. And I'd really like
to see some data on 6).


> At the very least, I'd like to see pregenerated pages instead of using
> gettext dynamically.

This is certainly a possibility. As I have said before, as long what the
translator gets is po files (the interface) the backend (using
pregenerated static pages or dynamically generated pages) isn't of much
importance.


> > <RANT NUMBER="2">
> > It's just that I get the feeling that you know better everytime we
> > discuss what is needed for translators.
> > </RANT>
> 
> I don't. I see that "this is how most sites that do translations do it",
> while there are very few sites that use gettext. That makes me think
> that maybe all those other sites have gone through this discussion too,
> and maybe they found problems with using gettext.

I doubt it. As I said earlier, there are many possible reasons.


> It's worth noting that the GNU site, creators of gettext, don't use
> gettext for translating their site, as far as I know (if they do,
> they pre-generate, but I don't think the GNU site is pre-generated,
> from my involvement with it).

I know that they don't, but on the other hand, when was the last time
the GNU site was redesigned? ;-) See point 1) above.


> >> The main problem can be summarized as such: *The nature of text in
> >> software is very different from the nature of text on webpages*.
> 
> >> Text in software consists of short, relatively independent strings. This
> >> is exactly what gettext is created to manage. When a string changes,
> >> you'll know, and you can translate that string again.
> 
> >> On the other hand, text on webpages is prose. It's long passages of
> >> text, and it's *highly interdependent*.
> 
> > No, the weak spot of gettext is very, very short strings (lack of
> > context). It's ideal for short paragraphs (usually more than enough
> > context).
> > I also translate documentation (not for GNOME though), and I have yet to
> > see any documentation that has too long paragraphs for translation.
> > Documentators know that long paragraphs are problematic for readers, so
> > the paragraphs happen to be just the right size for translators too.
> > Also, the interdependency is not a problem, it is not a problem in
> > documentation, so I fail to see why it would be for web pages. See
> > below.
> 
> As far as I know, GNOME documentation is not translated using gettext.
> Although someone said that KDE does that, and that's definitely
> interesting, it provides a data point for someone using gettext for
> translating something more like the text on web pages.

The reasons that a gettext solution (more correctly a xml-i18n-tools
solution) is not currently used for translating GNOME documentation is
that:

	a) Noone has been interested in implementing it yet
	(not many people are translating GNOME documentation
	because of the difficulty, so it's kind of a catch-22
	situation).
	b) We really don't want to hit a moving target
	- The GDP is currently preparing to move all GNOME
	documentation from SGML DocBook to XML DocBook. Once
	that is done, modifying xml-i18n-tools to support
	documentation translation (or use the tool from the
	KDE project) should be a lot easier.

There you have it. We discussed this extensively at GUADEC, as
translation of GNOME documentation currently is in a very bad state
because of this.


> Thank you for addressing my other points on chunking of text for
> translations, etc. I'm still not convinced that gettext is ideal for the
> job, but you've alleviated some of my worries.
> 
> I think gettext might work, but it's imperative that we find some way to
> do pregeneration of static pages in this case, because if not, there's
> going to be an unacceptable performance hit.

Again, I think some data would help a lot on this (stress testing?).


Christian
References:
- Handling Translations
  - From: jeichorn
- Re: Handling Translations
  - From: Joakim Ziegler
- Re: Handling Translations
  - From: Christian Rose
- Re: Handling Translations
  - From: Joakim Ziegler
- Re: Handling Translations
  - From: Christian Rose
- Re: Handling Translations
  - From: Joakim Ziegler
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]