Re: Yelp stuff



tis 2004-08-03 klockan 22.23 skrev Shaun McCance:
> On Tue, 2004-08-03 at 12:49, Christian Rose wrote:
> > tis 2004-08-03 klockan 19.03 skrev Shaun McCance:
> > > (CC me on replies please.  I'm not on gnome-i18n.)
> > > 
> > > So Yelp had four strings marked for translation in l10n.xml.in that were
> > > just one-character entities: “ ” ‘ ’.  Most of
> > > the translations continued using entities for these.  But it seems that
> > > when intltool merges the translations in from the po files, it escapes
> > > everything.  So instead of having “ we have &8220;, which is
> > > not at all what's wanted.
> > > 
> > > I've decided to trust that people will have non-braindead editors and
> > > just put the UTF-8 characters in l10n.xml.in instead.
> > 
> > This will work only with intltool >= 0.27 and GNU gettext >= 0.12.
> > Older gettexts didn't like non-ASCII data in msgids at all.
> > 
> > Whether other gettext implementations allow non-ASCII msgids at all I
> > don't know.
> 
> How much does gettext have to do with this?  This particular strings are
> merged into an XML file and extracted in XSLT.  Does gettext get messed
> up just by the presence of the UTF-8?

Older gettexts do get messed up, yes (they'll complain loudly and won't
allow these messages to be translated, but it's not a fatal error so the
translation will still work for the other messages, apart from the noisy
and irritating warnings at build time from msgfmt).


Technically, gettext requires a po file to be encoded entirely in one
character set. This means that msgids (the strings from the application)
and the msgstrs (their translation) needs to be encoded the same way
since they need to be in the same file.

However, often a translation would need some exotic encoding, and not
every old school encoding fits every translation, so different encodings
would be used for different translations.
So, as ASCII is the only common subset with most encodings, gettext
simply required msgids to be in strictly ASCII only.

However nowadays we have Unicode and use UTF-8 for all GNOME
translations, so the original reason for this behavior doesn't apply,
and so gettext was lately also changed to allow also UTF-8 in msgids.


> I really don't know what I should
> be doing here.  As it is, those locales are just broken in Yelp.

I think you're doing the right thing. I just wanted to let you know that
it may not work with older gettexts and that there might now be a
dependency on a newer gettext in order for these UTF-8 strings to be
translated.


Christian



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]