Re: Doc Translations

From: danilo gnome org (Danilo Åegan)
To: Karl Eichwalder <ke suse de>
Cc: gnome-i18n gnome org
Subject: Re: Doc Translations
Date: Tue, 14 Sep 2004 11:11:12 +0200

Hi Karl,

Today at 8:58, Karl Eichwalder wrote:

> danilo gnome org (Danilo Åegan) writes:
>
>> Eg. "With &app; ..." should be translated differently from "&app;
>> is..." in Serbian.  With &app; being "Bug Buddy", corresponding
>> translations would be "Sa *Bratom buba* ..." and "*Brat buba* je..."
>
> I'd vote to add some more language specific entities (as long as it
> makes sense):
>
> <!ENTITY app "Bug Buddy">
> <!ENTITY app-sr1 "Bratom buba">
> <!ENTITY app-sr2 "Brat buba">

Since entire translation is supposed to be inside PO file, and *most*
PO file editting tools don't allow easy addition of new messages, can
you propose a way to do this? (I can imagine entire DTD subpart to be
changable, see below)

Note also that Serbian might require a full set of 7 separate
entities in extreme situations (when all declinations get used; in
practice it will be anywhere between 3 and 5), so it's a bit of pain
to keep track of them all (and it's hard to name them so they can
be easily and correctly used).

> Using entities those name would be consistent all over the document,
> hopefully.

Sure, but there're some technical problems as wellâsee above.

>> It would be even easier for me (in terms of code) not to have to cope
>> with this (I even had to do some terrible hacks to achieve this
>> behaviour),
>
> I thought the parser can replace the entities automatically (xmllint's
> --noent option).

Yes, but that conflicts with the goal I mentioned earlier.  We've got
one XML document split into several .xml files.  We want that entire
document to be translated using *one* PO file, and later merged
back.  If I let libxml2 replace entities automatically, DOM tree I
get would be as if it was all in one file, and I won't be able to
remerge them file at a time (since I modify in-memory DOM tree, and
simply let libxml2 serialize it later on, thus ensuring correctness).

As a side effect, I'd also lose source-file references as used in #:
comments.  But this is not a big issue I guess.

>> but it wouldn't solve all the problems like the one above.
>> The alternative is to keep entities here, and let translators
>> omit them if they feel like it, but that means I'd need to provide
>> more messages which will be entity translations, along with a
>> translators' comment indicating what entity this message is about.
>
> Yes.

It's very hard to determine the sort of text contained in the
entity.  It can be anything from simple text, well-formed XML,
to non-well-formed XML.  It's a pain.  So, we'd probably need to
assume it's at least well-formed XML.

>> When we get to that, we should probably allow translators to define
>> their own entities, and so forth.  But PO format is simply not that
>> well suited to encapsulate entire XML format.
>
> Yes, own entities are necessary.  There is always some meta stuff
> involved ;)  As long as the PO format is wanted, you can make use of the
> usual tricks: either add some comments or deal with it inside the
> messages (texts).

The simplest way to achieve this is to simply let translators add
DTD extensions they wish themselves.  Eg. have something like:

#. Translators: define any other entities you wish to use here
msgid "<!ENTITY app "Bug Buddy">\n"
msgstr ""

But that will defeat the purpose somewhat, since we'd still require
translators to be proficient with XML (apart from knowing that they
need close their tags; perhaps another PO file "tag" 'xml-format'
would be nice to be understood by msgfmt).  Yet, it should scale well
to anything translators feel like doing with XML.

It also complicates the code, because SYSTEM (and perhaps some other)
entities would have to be treated separately: we don't want
translators to redefine all of it.

> Adding an option to stop expanding entities would be nice.

I'll do that a bit later.

>> FWIW, there's also a simple way to solve system-entities problem I
>> described earlier: simply parse entire XML while *expanding* external 
>> entities (this is a one-line change to the code, to turn this
>> behaviour of libxml2 on).  It means you won't be able to merge
>> translations back into multiple files, but only one.
>
> Splitting the big PO file into chunks matching smaller .pot files (which
> are corresponding to .xml files) is possible by using tools coming with
> gettext.

This was the other way around: we have multiple XML files joined
together using SYSTEM entities.  We want translations from single PO
file with translations to go into multiple XML files which don't have
DTD's and stuff.  As said above, if I let parser expand all the
entities, I lose track of files (and I cannot do it for non-complete
XML files, since I have no place to define entities at all in them).

So, what would be the best for *translators*? Allow them to define
DTD themselves?  The current behaviour is what seems most sane to me
in the sense that it works for everybody (every language), without
complicating it too much.

Cheers,
Danilo

Follow-Ups:
- Re: Doc Translations
  - From: Karl Eichwalder

References:
- Doc Translations
  - From: Kurt Maute
- Re: Doc Translations
  - From: Danilo Šegan
- Re: Doc Translations
  - From: Karl Eichwalder
- Re: Doc Translations
  - From: Danilo Šegan
- Re: Doc Translations
  - From: Karl Eichwalder

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]