Re: Manual page translations



Clytie Siddall wrote:
I'm really glad this has come up again. :)

On 23/05/2006, at 2:23 PM, Brent Smith wrote:

I've been looking at getting the manual page translations to work in
yelp.  There was a sorting/priority issue that I figured out with
respect to which man pages should have precedence (obviously  translated
man pages for your $LANGUAGE should have precedence over others), but
now I am facing another problem.

It seems that man pages are not translated in utf-8.  This means we  have
to perform character set conversion between whatever code page is
appropriate for your language into UTF-8 (and then I need to  rewrite the
man parser with utf-8 in mind).  Can anyone point me to some
documentation about how this is usually done?  Specifically, how do I
determine the appropriate character set conversion to do based on the
value of $LANGUAGE?


Bruno Haible has released a groff-utf8 patch, and is currently working on a full groff release which will support UTF-8.

http://www.haible.de/bruno/packages-groff-utf8.html

Here's one problem.  Yelp doesn't use groff - at all.  It parses the man
pages directly and converts them to XML, which is then converted into
HTML.  Ideally, the man parser should support UTF-8, and I'm working to
get it to this point right now.

I translated my pilot manpage:

http://translate.sourceforge.net/wiki/guide/project/manpages

in PO format, using the po4a converter. In its final manpage format, it reads perfectly using groff-utf8.

So I can't see why any manpage translated into a UTF-8 language wouldn't display in a UTF-8 enabled reader like Yelp.

This will definitely work in yelp, once I finish adding utf-8 support to
the parser code.

If we have manpages which have been translated into legacy encodings, that's something that needs updating, not something we want to waste effort backporting to support. UTF-8 is the standard, all languages can use it, and all current free-software translation efforts do use it.

This is where the problem lies.  Yelp has to decide whether to use a
translated man page for your language over the english version.  Right
now, it will always prefer the translated version for $LANGUAGE over the
english version.  This decision takes place when yelp is starting up and
creating the table of contents.

Now say you go to view the translated apropos.1 man page.  The yelp
parser code assumes this is UTF-8, but it is actually encoded in
ISO-8859-2.  The parsing of this man page will fail horribly when it
expects UTF-8 but gets ISO-8859-2.

The problem is the inconsistency in knowing how a translated man page is
encoded.  All the files in /usr/share/man/es/ should have the same
encoding, otherwise there is no way for Yelp to determine the encoding
and parse the file.

It would be interesting to see how groff-utf8 handles this..


A lot of the translated manpages are old packages. We need them updated. I think it makes more sense to focus on getting them updated, standardize the process, and make it possible for UTF-8 manpages to be read. Once mainstream readers like Yelp are available for UTF-8 manpages, more translators will be motivated to update their manpages. (Integrating po4a into interfaces like Pootle will encourage a lot more translation, since that means manpages available in PO format.)

I've been hoping to get manpage-translation standardized through the Translation Project, but I need to test my pilot manpage more. What's held the testing up has been the lack of UTF-8 manpage readers. ;)

Can I please get a copy of a man page in UTF-8 for further testing in Yelp?

figuring out if there are currently any UTF-8 encoded man pages on my system would be a tedious process.

Is there anything out there that will "guess" the encoding of a file?

from Clytie (vi-VN, Vietnamese free-software translation team / nhóm Việt hóa phần mềm tự do)
http://groups-beta.google.com/group/vi-VN

--
Brent Smith  <gnome nextreality net>
IRC: smitten / #docs / irc.gnome.org





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]