Re: Proposal for declinations in gettext
- From: Danilo Segan <dsegan gmx net>
- To: GNOME I18N List <gnome-i18n gnome org>
- Cc: translation iro umontreal ca, linux-utf8 nl linux org,translation-i18n lists sourceforge net
- Subject: Re: Proposal for declinations in gettext
- Date: Sat, 14 Jun 2003 08:29:43 +0200
Miloslav Trmac wrote:
>Hello,
>On Fri, Jun 13, 2003 at 10:14:25PM +0200, Danilo Segan wrote:
>
>
>>msgid "king"
>>msgstr<0> "kralj"
>>msgstr<3> "kralja"
>>msgstr<5> "kraljem"
>>
>>msgid "move %s"
>>msgstr "premesti %<3>s"
>>
>><i>, where i=0 .. (PO-Number-of-noun-forms)-1, is the index of the form
>>required, and it depends on the sentence construction. It is determined
>>by the verb, or perhaps words like "with", "whom", ... Some of
>>msgstr<i>'s can be omitted if it's known not to be used in composition
>>(most are highly unlikely to be ever used in translations, like the
>>"vocative" form of "hey %s").
>>
>>
>I suppose the obvious question is: "How do I know which declinations
>a word is used in?" (0, 3, 5 in the example above). In order to solve
>this, you'd have to somehow mark "move %{move_card}s" and "{move_card}king"
>so that they could be matched, msg$SOMETHING would then check for
>missing/unneeded declinations. I don't think this is much easier
>than using similar tags to explicitly mark contexts words are used in
>(see below).
>
>
Well, the numbers you choose for "declinations" can be arbitrary: the
software would not force anything on you. Of course, it's also possible
to use some keywords ("move_cards"), but I think that's harder on the
implementation (not that all else is easy :-)).
>
>
>>The good side of this approach (the syntactic elements are arbitrary,
>>don't comment on those) is that programs that use gettext for l10n would
>>need no change:
>>
>>
>Wrong. A typical gettext usage of the above is in principle something like
> printf (gettext ("move %s"), gettext ("king"));
>that is, there is currently no way to correlate the "move %s" with
>the "king".
>
>
With some hacks, it could be made to work transparently for the
programmer. The idea would be to use preprocessor to redefine "printf"
and similar functions, and a gettext("king") to return an array if
available (again, one could use obnoxious hacks for this, like putting
some structure pointer behind the \0 byte, or perhaps using some magic
number in a string that would indicate that it is actually a pointer to
that same structure).
A bit better solution would be to just replace all instances of *printf
functions with *printf( gettext_printf(format, parameters) ), but this
would still require hacks if we're to maintain some compatibility with
those programs that use:
char *s=gettext("king");
printf(gettext("move %s"), s);
Of course, I admit that thorough changes would be best in terms of
applications, and interface. I'll forward a message with one kind of
proposal which would hold context in a single variable that <jmaiorana
at idirect.net> sent to the linux-utf8 mailing list.
>>Before diving into gettext code, it'd be nice to hear if this kind of
>>approach would work for any language other than Serbian (I repeat, I
>>find it likely to work for Slavic languages, and German, those being the
>>languages I'm at least a bit familiar with).
>>
>>
>It looks general enough to work for any language (if you define enough
>declinations), but I'm not sure this is the way to solve this.
>Doing the declination in my head is just too much work :-)
>
>
You don't have to do all the declinations. The translations usually
require two or three out of seven available in Serbian language (for
instance) -- I guess it would be similar for other languages. In cases
like that, I could even define "number-of-declinations" to be 3, and use
them according to how common they are. The important thing is that there
is an opportunity for translators to fix things.
Still, I'm not sure it would work for "any" language: we're still
talking in terms of Slavic languages, right? (Czech, Serbian,...) Almost
noone else commented on this regarding other, non-similar languages.
>The approach seems to easily lend itself to creating a single
>word-form database; then you'd want a database of which declinations
>are used in which verb forms, and in a few months gettext might be
>trying to do universal machine translation. But then again,
>maybe gettext maintainers want it to do that.
>
>
Well, this kind of approach would certainly be helped with a word
database, but I don't find it as a requirement.
And just to be clear, I am not involved with gettext maintainers, so
don't blame them for any of my brain-dump :-)
>What I'd like to see and waht I think would go some way towards
>helping these problems is integrated support for context markers.
>E.g. in nautilus, we have strings like
> "[files that are] named [README]"
>which is much better than just "named". Currently, every program
>does this differently (nautilus, KDE, gnucash at least).
>
>
Unfortunately, this doesn't work quite well. In fact, Nautilus is not
the example one should be proud of (in terms of l10n).
There were numerous issues with plural-forms themselves in 2.2.x
releases (guess they're fixed in 2.3.x), and the solution used by
Nautilus would solve one problem (that of having the correct form for
"named"), but would still solve no problem for "[files that are]".
Here's a particular example from Nautilus translation (I'll use english
strings to describe problem):
#: libnautilus-private/nautilus-search-uri.c:325
msgid "[Items ]modified today"
msgstr "modified today"
The problem here is that in Serbian (I did the Nautilus translation, so
I know what I'm talking about :-)), the correct (or at least a way
better) form would be "Today modified items", instead of "Items modified
today". Or, it could also be "Items that are modified today", which
doesn't follow the pattern, and should be composed like some other
strings ("[Items that are ]named[ README]"). If a translator would
translate it as "that are modified today", it might work for this
particular example, but it might be used in inappropriate ways (s)he
doesn't know about.
So, here would printf format strings be much more appropriate, because
order could be reversed and manipulated in "free style". Approach with
[context] markers instead of format strings might work for many
languages, but it wouldn't work for all -- actually, it would be wrong
in some. So, I believe this kind of context information belongs in
comments-to-translators, which xgettext also extracts without problems.
What my approach is to solve, is that once context information is
available (whether a translator ran the program in question, and
discovered how some strings are composed "incorrectly", or the
programmer provided that kind of information on composition), translator
has the possibility to make it work for his own language. So, you
provide declinations only when you know they're needed.
Cheers,
Danilo
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]