Re: PO-based Documentation Translation

Hi there,

On Wed, 2003-10-01 at 15:33, Karl Eichwalder wrote:
> Tim Foster <Tim.Foster@Sun.COM> writes:
> > msgid "a,b,c,d"
> > msgstr "a,b,c,d"
> >
> >
> > However, doing it this way means that if a single sentence changes, the
> > entire paragraph gets marked as a "new" message (probably a strong fuzzy
> > match) in an automatic translation system.
> My proposal to solve this problem was: keep track of the previous msgid
> and add it as a comment (using a marker like '#|') while doing the
> msgmerge step:

Yeah - that would work, - you're getting the differences displayed, but
not taking advantage of the matches you'd get across other projects if
you could segment at the sentence level.

> Splitting at the sentence level can cause other problems.  What's a
> sentence is different from language to language.

You're right : we have different segmenters for different languages. It
just turns out, that 99% of our translation uses english as a source

> > In fact, there's another bonus too, since if the sentence "A launcher
> > can reside in a panel or a menu" appears in another document, but not in
> > exactly the same paragaph context, you still have the translation in
> > your database and can reuse that.
> There is no guarantee it will fit this way.

Of course, context is important, but in the absence of anything else, we
can suggest the match - a reviewer doesn't have to use an exact match we
suggest, but usually it's acceptable.

In terms of saving time and money, sentence segmentation is fantastic
and works for us.

> I case of <b> (= bold) this will work; it may or may not work for
> other elements

For inline tags, we've found this to be just peachy and haven't broken a
html or sgml document yet based on this approach.

>  thus you are better off converting or escaping the
> inline "tags" (and convert it back once the translation is done).
> Depending on your data something as follows may work:
>     This is a piece of !!b>bold text.
>     This is a new sentence that!!/b> isn't in bold any more.

Yes, that would be another way of doing it, but it requires more work
post-translation, which was something we were trying to avoid.

Our rule for sgml, html and xml was to never put invalid (not
well-formed) text into our database - so if we had a segment with a
missing open or close tag that was not allowed in the particular dtd
involved, we'd always make them well-formed before storing the text.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]