Re: Localization Structure Model



Joakim Ziegler wrote:

In general, I'd like to avoid SQL as much as possible. On sites like
www.gnome.org, which get massive traffic, it's a hog. We should use as many
flat files as possible, and where we use SQL, we should use cache files for
the overview pages and other often used parts of the site.

Agreed. SQL seems overkill for this, and as pointed out, it makes it a whole lot more work to work with the translation instead of when everything is plain flat files in a cvs.


Now, localizing appindex entries is a separate issue, but for localizing
general page content, the best thing to do if we're using PHP is probably
defining some functions like l10n_text_start(language) and
l10n_text_end(), and put blocks of text between pairs of these functions.
It's then really easy to pick the right one, since l10n_text_start just
checks if language is the same as the current language, and if not,
suppresses the output.

Hmm, it might be of interest to know how other web sites do this.

Thatware, for example, uses a browser check to see preferred language (overridable by personal settings in the web site user settings) and then sets two variables, $language and $locale, and sets them in a cookie. $locale is then used for formatting database date references in stories in the user's preferred date format (for example a typical date/time string in the Swedish locale is displayed as "2000-11-16 14.19" whereas it in the English locale would be "11/16/2000 2:19 PM"). This is _very_ easily done with the use of setlocale() and strftime() in PHP.

$language is naturally used for selecting in what language the site content should be displayed. Basically, the language resource files are just simple "includeable" files with a basic switch function. Not very elegant and not very easy to translate, agreed, but it's a rather simplistic solution. Based on the $language setting the correct language file gets included with each page request, and all strings that should be translated are marked in a translate("This is a string that should be translated") kind of way that just calls and outputs the result of the switch function in the included translation file.

SourceFourge uses a slightly different approach. However, on SourceForge I haven't helped in the design of the localization system, I just helped with translating some of it into Swedish, so I don't know if I'm the correct person to describe it. However, it seems to use a somewhat object-oriented approach (I don't understand OO so please be patient). All the basic strings in the site are defined in a class, and then the different language files "extend" that class to point on new values
"class Swedish extends BaseLanguage {"


Then what's the problem with these solutions? Well, they're simple to implement, and obviously work, but they're not helping the translator at all! The problem is how strings are stored and kept in the language files. Basically, there are a few basic things a translator would desperately want:

* Have an immediate reference to the original string, to help
  with comparing the original and it's translation and detect
  errors or changes. That is, the original string should be
  shown immediately before the translation. That may sound
  simple, but in SourceForge you don't have that; all
  strings are individual variables defined in one place and
  that are redefined in the language files by using the strings
  variable name. This means that you have to search in multiple
  files just to know what the original, untranslated string
  value was.
  Thatware doesn't have this problem since it uses case switches
  with the original string to the left, but the case switch system
  isn't pretty either.

* Detect changed or new strings! This one is easy to forget for
  non-translators but it is very essential to translators. If you
  don't get an automatic hint that an original string was slightly
  modified in source, you might never realize that that translation
  should be updated. Likewise for completely new strings.
  This process HAS to be automatic. All real-life experiences
  tells us that developers always forget to tell the translators
  whenever they change a string, and if the system doesn't detect
  that and provide hints for that to the translators, there is a
  big problem.

* Automatic reuse of translations. If the exact same word has been
  used in one place and one translation exist, make sure both strings
  use this translation. This is solved by the Thatware example (all
  calls to translate() with the same identical string will return
  the same result), but this won't work in the SourceForge model
  since the strings are distinguished and called with different
  variables.

* When a new string occurs, check if there is already one similar
  and reuse that and mark it for human check. What this means is
  that if "User" is already translated somewhere, and when there
  is a new string "User settings" this will automatically be
  detected and the old "User" translation used. However, they are
  not entirely similar, so it should be marked so that the
  translator can update it. This way, parts of translations are
  reused, and this often saves time with translating longer strings.


What is the solution to all this? Well, basically I think that a gettext solution would solve all this, albeit it being harder to implement (the truth is, I don't know, but let's say it is). For those who don't know, gettext is a suite of free tools that are used in translations of free software. It has been used for a long time, and it helps solving many (but of course it's not perfect anyway) of the problems in translation. Naturally, it is also the tool already used by the GNOME translators, so if this tool is used, translating the GNOME web pages would be (from a translators POV) like translating any other GNOME software in cvs... =)


Christian



#######################################################################
Christian Rose
http://www.menthos.com                    	    menthos menthos com
#######################################################################





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]