Translation problems and bad strings



Really sorry about cross-post: gnome-i18n know most of this
but perhaps not the explanations for a couple of the strings
below; gnome-doc-list need to know what chaos "Control" caused;
and gnome-hackers is my attempt to catch the hackers. 

Not at all sure where replies should go. Use your common sense :)

Some time ago I asked gnome-i18n for what they thought the worst
strings to translate in Gnome were. And got a pile of answers
which I then didn't summarise on-list. Reading the threads about 
"what problems do translators face", I am reminded that I should.

I know a load of hackers are familiar with this, but for
those who aren't, here's  quick description of the translation
process. There are loads of ways of translating: editing po files 
by hand, using a web interface, using gtranslator or KBabel; but
they all revolve around a list of strings which go like this:

  #: /module/path/path/filename
  msgid "Original string here with occasional <b> or \n marks"
  msgstr ""

...and the aim of the game is to fill in msgstr. You do not
necessarily have any more context than that, and you have
to guess what some strings mean. (For example, if you don't
have a CD drive, you can't start rhythmbox to see whether
a particular message appears when you stick a CD in..) 

I begin with this one, which started the whole thing. A remark
from one of the Arabic translators on IRC:

<olimar> joke of the month in a party: "Model column to search
         through when searching through code"

I'm still not sure what this means..

There are lots of things programmers can do to help here:
particularly they can put comments by the strings in the code
saying things like "Translators: this is seen by.." or "This
refers to..". For some apps there is such specialised vocabulary 
that this can really help. Unfortunately, nearly everyone
decides to translate gtk+ early on (it makes up half of the 
strings in developer-libs), and it's full of such strings as above
and there is not really a lot you can do about it without a
gigantic split into "messages for users" and "messages for
developers". Having said that, anyone who finishes gtk is
well set up to finish most of Gnome :) 

So here are the sort of strings translators were dealing
with in the summer. The aisleriot examples have gone, I think
(yay Callum!): but the rest were all around in the summer. 

Historical (ie, gone, to the best of my knowledge):
---------
* gnome-games/aisleriot:
  msgid "borp"
			Read it backwards: "prob"...
			(Abel Cheung -- who later figured it out, decided
			it was cute, and put "melborp" in somewhere else :)) 

* gnome-games/aisleriot,
* nautilus/somewhere I forget:
  msgid " of "
			This is things like "king of hearts" or "file 1 of
			8". This flatly won't translate in some languages.
			Malayam (.ml) needs to see "n of m" for numbers 
			and change it to say "of m, n". Different words
			are used for "of" in Welsh (cy) for "1 of 10" and 
                        "the king of hearts". There's also a further
                        complication for cy which I don't think I can 
			explain in a single line, so you are spared.



What do these mean in _English_?
-------------------------------
#: aisleriot/golf.scm.h:3
msgid "bdc\n"
msgstr ""
			Debugging message referring to "button-double-clicked"
			subroutine. 
			(Abel Cheung)

#: several places, apparently
msgid "Control"
msgstr ""
			"That's just gorgeous - Is is a verb? A noun? What 
			kind of noun? Where can I find it in the app?"
			(Stanislav Visnovsky)
			"Control" crops up in strings all over the place. 
			Months later I discovered that this is the Gnome
			Docs Project official word for "widget", because
			"widget" is thought not a good word to give to 
			end-users. This is probably true. But translators
			do at least know "widget"; and it doesn't have another
			eight possible meanings. And at least some
			translators didn't know "control" was in the docs
			team's word list of Good Words. 

#: gtk+
msgid "IM Preedit style"
msgstr ""
        from Ole (dk), who noted that you can figure out that IM is
        input method from other entries (if you're working from the
	po file and not from a web interface), but preedit?

	Some months later, Dave Malcolm explained on IRC, and I think I
        shall share for anyone who didn't know:
        <DaveMalcolm> olimar:  GTK has small "plugins" that handle text 
        input in different ways; they take keyboard input and convert
        the keypresses into text being typed. If you right-click in an 
        entry box or gedit you can select the method.
        <DaveMalcolm>  "pre-edit" is where a preview of your edit appears
        in the control; so for Japanese you might type the romanised form, 
        and have that appear in grey as the preedit string, which might 
        later get converted into hiragana/katakana/kanji characters 
        depending on further input.

	So now we know! Thanks, Dave. 

Error messages:
--------------

gcalctool is a great example for this. There are over _forty_ 
strings which are error messages referring to the inner workings 
of the code. For example: 

msgid ""
"*** B = %d ILLEGAL IN CALL TO MPCHK.\n"
"PERHAPS NOT SET BEFORE CALL TO AN MP ROUTINE ***\n"
msgid ""
"*** ERROR OCCURRED IN MPROOT, NEWTON ITERATION NOT CONVERGING PROPERLY ***\n"
msgid "*** ABS(X) NOT LESS THAN 1 IN CALL TO MPEXP1 ***\n"


Words that cause problems for different languages:
-------------------------------------------------
* Multiple languages distinguish between "key" as in "thing that turns
in lock" and "thing on a keyboard". It's easy to guess when you're 
translating gconf itself or acme itself which it should be. In other
files though, it's not so easy.

* "Package" and "packet" seem to be the same word in more than one
language (Welsh, French)

* "Render":

  Render is very hard to translate, at least for Danish. Sometimes it
  means "draw", sometimes "generate" or "create", sometimes "copy to
  screen". Usually it is some sort of combination.
					-- Ole Laursen
  Abel said it was the same for Chinese.

* "Antialiasing":

  Perhaps it's just in Welsh, but we in the cy team had endless trouble
  with this. Translating the parts of the word made no sense. Trying to
  make up a word which explained what the technique meant made no sense.

* "Meta"

  Anything involving the Greek prefix "Meta" makes olimar unhappy:
  trying to find an Arabic equivalent is apparently hard. Metafile,
  metadata.. okay, so it's a file about files, and data about data.
  Great: so metacity is... erm. No. Ow. There's also the meta key,
  but I don't actually remember seeing that in strings. 

Long strings of "Noun noun noun noun":
-------------------------------------
Particularly horrible when one or more of the nouns can also be
used as a verb, and common in tooltips and menus. Examples: 

# gnome-terminal: NB: totally undocumented feature which jrb
explained to me recently. 
msgid "S/Key Challenge Response"

# libbonobo: 
msgid "generic factory 'new' moniker"
			-- Ole again. Abel suggested "all CORBA keywords"
			as well :) 

# libbonobo:
msgid "ORB IOR handling moniker"
        from Andraz

# nautilus:
msgid: "Image Properties content view component"
        from Reinout van Schouwen (.nl)

#: Evolution-groupwise: 
msgid "Evolution Calendar Groupwise backend"

I can't remember where it's from, but the record is five nouns (some
of which might be verbs or instructions) in a row. Other "noun? verb?
what?" words can even be "End" and "Finish". "-ing" words have similar
problems. I think the technical term for the variety that isn't a
verb is "gerund", but I can't think of a good example in the po files
offhand (but they are there!)

Miscellany:
----------
Strings that arrived without comment or which don't fit elsewhere.

  * "Resident memory set"
  * "Minimum Shared Memory Size"
  * "Minimum Resident Memory Size"
  * "Request obsoletes service's data"
  * msgid "Error checking error; no exception"
  * "MInternal Error: Weird value (%ld) in do_test\n"
  * "Model column to search through when searching through code"
  * "FALSE displays the \"invisible char\" instead of the actual text (password mode)"
  * "Just because a crosswalk looks like a hopscotch board doesn't mean it is one"

Incidentally, I showed Alan the "FALSE displays.." one and he said
"Makes perfect sense to me." Because he knows what it's talking 
about. Non-hacker translators don't. 

Some translators make a point of filing every string with a 
problem in bugzilla. This takes _ages_ but it helps. But the 
problem then is that there is a very limited period when you 
can change them.

Most translation teams use the translation status tables which
are at http://developer.gnome.org/projects/gtp/status/ to keep
on top of things. The current 2.5 stuff for each language is at 
http://developer.gnome.org/projects/gtp/status/gnome-2.6/XX/developer-libs/index.html
http://developer.gnome.org/projects/gtp/status/gnome-2.6/XX/desktop/index.html

(put any language code in XX: sr, cy, de..)

When strings are changed at all, it upsets all the statistics. So
you have to find a time when you _can_ change them, because those
statistics do matter and do help you keep on top of things. It is
really really disheartening to see your 100% app has suddenly gone
to 81% because someone has altered all the tabs inside the strings;
and even worse when it's a much more substantial change which
requires you to do a lot more than just remove the fuzzy marker.
And towards the end of a release cycle is not the time to do it.

But some of these strings really do have to be changed, or explanations
appended in the comments next to the function that contains the things.
For example, Epiphany goes to the appropriate language page on Google 
because of this comment:

#. Translators you should change these links to respect your locale.
#. * For instance in .nl these should be
#. * "http://www.google.nl"; and "http://www.google.nl/search?q=%s";

Others found with a quick grep:

evolution/po/cy.po:#. This is a filename. Translators take note.
gnome-applets/po/cy.po:#. Translators - The + and - refer to increasing and decreasing the volume.

I don't have a complete checkout of all CVS, but I have quite a
few modules out. But that's about all there is. A few more of
"Translators: this 'plane' is not the sort that flies but instead
a term used by Unicode' would be really nice. (Actually, that's
a bad example, because apparently the place that appears is not
a place you can put such a comment: but it's a good example of
the sort of word that might need clarifying.) 

There used to be a string review period in the release cycle.
It concentrated on the English as far as I know. Making the
English clearer certainly helps translators, but even then
there can be problems. Most translators do the gnome-glossary
early on as a sort of standardising terminology exercise, but 
even so we (cy) didn't realise that "control" was the approved 
term for "widget" when we met it later on in po files. 

So there you are. I don't really know what the solution is, but
I do know that in between 2.4 (we which had at 100% in Welsh)
and now, we have acquired 1000 fuzzy strings and 750 untranslated
in apps which we had done completely; and another 6000 strings
to do from the list of "proposed" so far. That's on top of 
16,000 strings which remained constant. That's a lot of strings, 
and I dread the changing of them in order to make them more
intelligible to other teams. 

But at some stage, some of these have to be fixed in the originals,
which means they become "untranslated" or "fuzzy" in the files
of every team which has done them already. It will make it easier
for new teams. But I'm not looking forward to the process!

Telsa




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]