Re: Proposal for declinations in gettext

From: Danilo Segan <dsegan gmx net>
To: GNOME I18N List <gnome-i18n gnome org>
Cc: translation iro umontreal ca, linux-utf8 nl linux org,translation-i18n lists sourceforge net
Subject: Re: Proposal for declinations in gettext
Date: Sat, 14 Jun 2003 08:29:43 +0200
Miloslav Trmac wrote:

>Hello,
>On Fri, Jun 13, 2003 at 10:14:25PM +0200, Danilo Segan wrote:
>  
>
>>msgid "king"
>>msgstr<0> "kralj"
>>msgstr<3> "kralja"
>>msgstr<5> "kraljem"
>>
>>msgid "move %s"
>>msgstr "premesti %<3>s"
>>
>><i>, where i=0 .. (PO-Number-of-noun-forms)-1, is the index of the form 
>>required, and it depends on the sentence construction. It is determined 
>>by the verb, or perhaps words like "with", "whom", ... Some of 
>>msgstr<i>'s can be omitted if it's known not to be used in composition 
>>(most are highly unlikely to be ever used in translations, like the 
>>"vocative" form of "hey %s").
>>    
>>
>I suppose the obvious question is: "How do I know which declinations
>a word is used in?" (0, 3, 5 in the example above). In order to solve
>this, you'd have to somehow mark "move %{move_card}s" and "{move_card}king"
>so that they could be matched, msg$SOMETHING would then check for
>missing/unneeded declinations. I don't think this is much easier
>than using similar tags to explicitly mark contexts words are used in
>(see below).
> 
>
Well, the numbers you choose for "declinations" can be arbitrary: the 
software would not force anything on you. Of course, it's also possible 
to use some keywords ("move_cards"), but I think that's harder on the 
implementation (not that all else is easy :-)).

>  
>
>>The good side of this approach (the syntactic elements are arbitrary, 
>>don't comment on those) is that programs that use gettext for l10n would 
>>need no change:
>>    
>>
>Wrong. A typical gettext usage of the above is in principle something like
>	printf (gettext ("move %s"), gettext ("king"));
>that is, there is currently no way to correlate the "move %s" with
>the "king".
>  
>
With some hacks, it could be made to work transparently for the 
programmer. The idea would be to use preprocessor to redefine "printf" 
and similar functions, and a gettext("king") to return an array if 
available (again, one could use obnoxious hacks for this, like putting 
some structure pointer behind the \0 byte, or perhaps using some magic 
number in a string that would indicate that it is actually a pointer to 
that same structure).

A bit better solution would be to just replace all instances of *printf 
functions with *printf( gettext_printf(format, parameters) ), but this 
would still require hacks if we're to maintain some compatibility with 
those programs that use:
char *s=gettext("king");
printf(gettext("move %s"), s);

Of course, I admit that thorough changes would be best in terms of 
applications, and interface. I'll forward a message with one kind of 
proposal which would hold context in a single variable that <jmaiorana 
at idirect.net> sent to the linux-utf8 mailing list.

>>Before diving into gettext code, it'd be nice to hear if this kind of 
>>approach would work for any language other than Serbian (I repeat, I 
>>find it likely to work for Slavic languages, and German, those being the 
>>languages I'm at least a bit familiar with).
>>    
>>
>It looks general enough to work for any language (if you define enough
>declinations), but I'm not sure this is the way to solve this.
>Doing the declination in my head is just too much work :-)
>  
>
You don't have to do all the declinations. The translations usually 
require two or three out of seven available in Serbian language (for 
instance) -- I guess it would be similar for other languages. In cases 
like that, I could even define "number-of-declinations" to be 3, and use 
them according to how common they are. The important thing is that there 
is an opportunity for translators to fix things.

Still, I'm not sure it would work for "any" language: we're still 
talking in terms of Slavic languages, right? (Czech, Serbian,...) Almost 
noone else commented on this regarding other, non-similar languages.

>The approach seems to easily lend itself to creating a single
>word-form database; then you'd want a database of which declinations
>are used in which verb forms, and in a few months gettext might be
>trying to do universal machine translation. But then again,
>maybe gettext maintainers want it to do that.
>  
>
Well, this kind of approach would certainly be helped with a word 
database, but I don't find it as a requirement.

And just to be clear, I am not involved with gettext maintainers, so 
don't blame them for any of my brain-dump :-)

>What I'd like to see and waht I think would go some way towards
>helping these problems is integrated support for context markers.
>E.g. in nautilus, we have strings like
>	"[files that are] named [README]"
>which is much better than just "named". Currently, every program
>does this differently (nautilus, KDE, gnucash at least).
>  
>
Unfortunately, this doesn't work quite well. In fact, Nautilus is not 
the example one should be proud of (in terms of l10n).

There were numerous issues with plural-forms themselves in 2.2.x 
releases (guess they're fixed in 2.3.x), and the solution used by 
Nautilus would solve one problem (that of having the correct form for 
"named"), but would still solve no problem for "[files that are]".

Here's a particular example from Nautilus translation (I'll use english 
strings to describe problem):
#: libnautilus-private/nautilus-search-uri.c:325
msgid "[Items ]modified today"
msgstr "modified today"

The problem here is that in Serbian (I did the Nautilus translation, so 
I know what I'm talking about :-)), the correct (or at least a way 
better) form would be "Today modified items", instead of "Items modified 
today". Or, it could also be "Items that are modified today", which 
doesn't follow the pattern, and should be composed like some other 
strings ("[Items that are ]named[ README]"). If a translator would 
translate it as "that are modified today", it might work for this 
particular example, but it might be used in inappropriate ways (s)he 
doesn't know about.

So, here would printf format strings be much more appropriate, because 
order could be reversed and manipulated in "free style". Approach with 
[context] markers instead of format strings might work for many 
languages, but it wouldn't work for all -- actually, it would be wrong 
in some. So, I believe this kind of context information belongs in 
comments-to-translators, which xgettext also extracts without problems.

What my approach is to solve, is that once context information is 
available (whether a translator ran the program in question, and 
discovered how some strings are composed "incorrectly", or the 
programmer provided that kind of information on composition), translator 
has the possibility to make it work for his own language. So, you 
provide declinations only when you know they're needed.

Cheers,
Danilo
Follow-Ups:
- Re: Proposal for declinations in gettext
  - From: Miloslav Trmac
References:
- Proposal for declinations in gettext
  - From: Danilo Segan
- Re: Proposal for declinations in gettext
  - From: Miloslav Trmac
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]