Re: xml2po



[I'm not sure if this discussion should be moved off gnome-doc-list
(something like gnome-i18n-tools or xml-i18n-tools lists seem more
appropriate, but almost nobody is following them);  if anyone minds
this discussion, YELL!]

Today at 15:42, Ismael Olea wrote:

> El lun, 07-06-2004 a las 00:11, Danilo Segan escribió:
>
>> > Working with kbabel has a nice feature: is docbook aware so it detects
>> > the markup segmentation and can translate automatically segments if they
>> > are collected in the glossaries you can use.
>> 
>> That wouldn't work too well for many languages, and surely not for
>> Serbian :)  So, I believe it's better to make it harder for some, yet
>> possible for everyone :)
>
> Maybe I'm not explained well or I don't understand you. What is the
> problem with segmentation? Indeed one of the main problems in the
> translation efforts is the segmentation. If you have it solved for free,
> why not to use it? :-?

Well, some languages use declinations which modify words.  Eg. "File" in 
  <para><emphasis>File</emphasis> is where your data is.</para>
would be translated differently than in 
  <para>Menu <menu>File</menu> is what you're looking for.</para>
(at least in Serbian; imaginary tags above).  So, I just cannot stick
single translation of "File" into the file.  For manually revised
works, that is suitable, but not for an automated tool.


> I know is a must to simplify the volunteering work. ATOH tools should be
> simple as unix tradition recommends. If the tool is for moving data
> xml<->po should do only this. New simplify shells should be over it,
> maybe creating a new wrapper, maybe with a very easy to use Makefile,
> etc. In my experience is better to do in that way because we are in a
> methodological change for maintaining doc translations and we need some
> time fighting with the new tools to find what and how should be done and
> what not.

This probably won't happen in time for first stable release.  The
idea is that most of document-parsing is *same* for both extracting
strings, and merging them back, so if we don't want things to happen
to us like they have happened with intltool (one algorithm in
intltool-extract, other in intltool-merge), we should keep both
algorithms in the same place: i.e. it should be one program/class/
whatever.

If that's really what you need, it's trivial to write simple wrapper
scripts to support any of the desired functionality (see below).

>> Of course, any suggestion on this is very much welcome.  In practice,
>> I recommend using only three variants of the command:
>>   xml2po -o template.pot file1.xml file2.xml # extract PO template
>>   xml2po -u sr file1.xml file2.xml           # update translation
>>   xml2po -p sr.po -o sr/file1.xml file1.xml  # merge translation
>
> But is a bit messing: -o works sometime for .pot and others for xml. 

"-o" is for output.  If you don't specify it, you get output on
stdout.  So, I believe it makes sense to keep this option as it is,
because it just directs the output into a file, instead of stdout: it
cares not what kind of output we have.  This is entirely in "Unixey",
at least IMHO.

Anyone thinks otherwise, and has a nice proposal to go with that? :)

> The parameters of the second use don't seem to be so similar the
> third. This is what I tried to say before. It not seems to be a so
> much coherent interface. 

I agree with that one.  I acted very foolishly in choosing parameter
format for the "-u" option: I tried to follow intltool-update format,
which blindly sticks ".po" to the filename.  I'll change it to
require full filename as well; I think it's much saner.

Still, I must repeat that (in ideal world :) xml2po should be
invisible to translators: only those designing build systems should
be aware of it, and its options.  Of course, it shouldn't be harder
for them to use it.

> Maybe this is the reason the poxml author choose to write
> several simpler tools :-?

As explained above, it would lead to many inconsistent tools, unless
designed appropriatelly (basically, it's not a problem, but I would
end up with much the same thing I have now: one BIG module/class which
contains all the functions, and several wrapper apps).  So, if you
desire such a thing, I suggest you to use xml2po in this way:
  xml2po file1.xml file2.xml ... > template.pot
  po2xml sr.po file1.xml > sr/file1.xml
  update-po-from-xml sr.po file1.xml file2.xml ...

where po2xml would be something like:

  #!/bin/sh
  POFILE=$1
  XMLFILE=$2
  xml2po -p $POFILE $XMLFILE

and update-po-from-xml:

  #!/bin/sh
  LANGUAGE=`echo $1|sed 's/\.po$//'`; shift
  XMLFILES=$@ # this should be a bit more complex to handle
              # filenames with spaces
  xml2po -u $POFILE $XMLFILES

Of course, you can go a bit farther with it (add checks for argument 
count, etc).

All of this is the reason I don't see a strong need to separate these
out right away; if you come up with better versions of such scripts
(Python even preferred :), I'll include them in the distribution.

>>   2. There was no reordering of the "messages" (whatever it is for
>>      the desired mode) in the translation as compared to original
>
> Not sure to understand you. With split2po the two files are expected to
> have the same XML structure. Is this what you mean?

Well, 
  <para>This is first paragraph.</para>
  <para>This is second paragraph.</para>
and translation
  <para>This is translation of second paragraph.</para>
  <para>This is translation of first paragraph.</para>
have the same XML structure, but such imaginary tool would fail
here.  If we use the translation without paying attention, and use
this as a compendium/translation memory, you can imagine what will
happen.

Also, things like added translator credits in <copyright> sections
would offset all strings in a translation by one or more, which would
turn out to be a mess: so, manual tweaking of some sort would be
desireable (I imagine to have integer parameter "offset" to the
command line for adjusting the offset).  But, this is more of item 1
in the list I cut off above :)

>> Then, we can simply rerun the xml2po machinery on each file, and
>> merge the two lists of messages one-for-one based on the order of
>> appearance.
>
> Yes. If I'd have a little bit programming skills I've just tried to do
> by myself :-D
>
> (I've asked a friend to write it. I'll inform you if he release
> something.)

I'd be very happy to accept patches.  If I find some time, I'll even
do it myself (it should be very easy to do inside xml2po).  If your
friend or you want to do it in an easier (though not very nice) way, I
can suggest you to simply modify MessageOutput.outputAll function to
print out simpler format (such as "index\nmessage\n\n"), which you can
parse more easily.

Your new outputAll function might look something like:

    def outputAll(self, out):
        index = 0
        for k in self.messages:
            index = index + 1
            out.write("%d\n%s\n\n" % (index, k))

There's one problem with this approach: same messages have already
been merged into one, so if two same paragraphs have been translated
differently, you'll get offseted translations at the time of the
repeated paragraph.  The solution is simply to remove the line (and
adjust the indentation):

            if not t in self.messages:

in MessageOutput.outputMessage (ok, I'm not very good at naming
methods :).

Cheers,
Danilo



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]