Please avoid unnecessary markup in messages



This is a request to the developers of all applications that have and
use translations. The problem relates to all applications, including
those that use Glade. The relevant Glade bug is at (1). But it's not
exclusive to Glade applications; there are many applications where this
occurs that are not using Glade. Unfortunately, there are many GNOME
applications that are more or less affected, including core ones like
gnome-desktop(2).


Here's the summary:
Please avoid using markup in messages that are marked for translation.
Do avoid this whenever possible. Forcing translators to "translate"
markup is not only an extra source for bugs(3), it serves no purpose and
it's also tedious additional work for translators, and makes the
translation process more problematic in many other ways, including
defeating the work for consistency in translations.


Here is the more detailed rationale:
This is one case of the "don't mark things for translation that
shouldn't be translated" motto. There cases where markup occurs in
messages can be divided into two types. The first type is messages
similar to this example:

   msgid "This text is <b>bold</b>."
   
In this case, the markup contains important positional information. Only
the text "bold" should be bold, and a translation needs to take that
into account, so the markup actually carrys important information to the
translator here. This use of markup is important and a necessity. This
use of markup isn't affected by this request.

The other case where markup is used, and which this request refers to,
is where the entire message or entire paragraphs or headings are
surrounded with markup, like in these examples:

   msgid "<b>Home Page Preferences</b>"
   
   msgid "<span size=\"medium\"><b>No file</b></span>"
   
   msgid ""
   "<span weight=\"bold\" size=\"larger\">What do you want to do with this "
   "file?\n"
   "</span>\n"
   "It's not possible to view this file type directly in the browser:"

In this type of messages, the markup contains no relevant information to
the translator, since all the translatable content is embedded in the
markup (in the last example above, the message could just as well have
been split into two separate messages, so this still applies). Instead,
these messages are just a nuisance and create lots of extra and totally
unnecessary work for the translators. Whenever the markup for a message
should change the slightest, all translations will have to be updated,
even though no translatable content changed. Whenever a new message is
added that has surrounding markup, even if the exact same message
without this exact markup was translated before, the message will have
to be "translated" again. Usually all this adds up, and it's not
uncommon to have a situation like with all these examples occurring in
the same po file:

   msgid "<b>Home Page Preferences</b>"
  
   msgid "<i>Home Page Preferences</i>"
   
   msgid "<span size=\"larger"><b>Home Page Preferences</b></span>"
   
In short, every possible combination of the same actual message but with
different and irrelevant markup surrounding it will have to be
"translated" separately, instead of just one "Home Page Preferences"
message. It's not just a nuisance and a lot of unnecessary work that's
slowing the translation process. Sometimes the surrounding markup adds
much more text than the actual message, and that confuses gettext's
fuzzy-matching so that it considers the message an entirely new message
and doesn't fyzzy-mark it with a previous similar translation, or that
it fuzzy-matches on the markup instead of on the actual message. This
can cause consistency problems in the translation, where the same
terminology won't be used since fuzzy-matching didn't work properly.
Fuzzy-matching is an important time-saver and important for consistent
use of terminology in translations, and when it doesn't work properly,
it affects consistency in a negative way.

The solution to these problems is to try to separate markup from gettext
calls, so that the markup isn't passed through _(). This doesn't apply
to the first case of messages mentioned above, but certainly for the
second type of messages. In the case of the examples above, they should
be rewritten so that they appear in the po file like this:

   msgid "Home Page Preferences"

   msgid "No file"
   
   msgid "What do you want to do with this file?"
   
   msgid "It's not possible to view this file type directly in the browser:"

I hope developers will understand the need for eliminating markup this
way, so that we can have a more efficient translation process in the
future.

Christian (and Carlos)


1) Glade bug: http://bugzilla.gnome.org/show_bug.cgi?id=97061
2) http://bugzilla.gnome.org/show_bug.cgi?id=97073
3) One example is http://bugzilla.gnome.org/show_bug.cgi?id=84779




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]