Re: Import from Launchpad



Hi Axel,

On Thursday at 23:11, Axel Hecht wrote:

> just hooking this up somewhere in this thread.
>
> I've been watching this thread with interest, as I see that discussion
> coming at me at Mozilla (hi Danilo :-))
>
> My thoughts are somewhat assumption-heavy, so bear with me when I'm wrong.
>
> I think that a good deal of the web translation tools actually offer
> multiple values for each translatable string, not sure about launchpad
> here in particular, but it may not matter.

Indeed, Launchpad does.  I am sure other tools do it too.

> Here's my picture:
>
> After a piece of software went through a webtool, at least through one
> with a low barrier of entry, you end up with something that I call a
> translation cloud. In this picture, I'm mostly dropping the
> change-pattern, and simply look at the result. Each particle in this
> cloud has various meta data, like, author, date of creation, possibly
> "imported from upstream".
>
> The task seems to be to me to extract a localization from this
> translation cloud. Where I use "localization" here in contrast to
> "translation" to mean something that satisfies certain software
> engineering principles, like consistency, correctness, testings
> passed, etc.
>
> That seems to be a data mining thing. Democracy on individual entries
> might be something to bootstrap with, but I would hope that there is
> way more structure hidden in that data. Like, if you'd pick a set of
> the N entries with the most participation, and you'd pick a winner
> localization for each, then you could take the set of authors that got
> M% of those strings right. And then you take all the strings where,
> say K% of those authors agree. Sounds complicated, but really isn't,
> if you drop the numbers to tune. For the 10 most debated strings, pick
> a winning localization. Pick all localized strings that are the same
> from all the authors that had all winners right, and you get a
> possibly good data set.

All this sounds like a really interesting approach.  I suspect it'd
give pretty good results.  The only (and big) downside I see is that
we are not really operating with that many a suggestion for every
message.  Another point to take into consideration is that (free
software) translators usually only work on untranslated strings (it
gives them better feeling of achievement, at least in my experience),
and it takes real dedication to actually review and proof-read
translations by others.

We can actually do many of these things in Launchpad already, but
because we get a lot of pissed off users (hi GNOME :), I am not
sure I'd have the guts to make this automatic.  At least not yet.

So, what we are doing at the moment is providing 'manual' tools to
assist in that.  I.e. it'd still be a reviewer who'd have to choose a
certain string, but he'll be able to do that using many different
'metrics', such as how well is a certain translator translating.

Also, a dubious question is when to actually do the 'calculations'.
New translations keep coming in live, and we'd run into a problem of
actually 'bootstraping' the metrics.  Nothing that can't be worked
out, of course :)

> In the context of this discussion, one valuable form of meta data
> would be "imported from upstream" and to grant a significant
> trust-value to those 'particles' in the translation cloud. That keeps
> random changes out without ruling out improvements.

Indeed, this sounds like a really cool idea :)

> A different approach would be to not look at the end particles, but
> rather to look at changes. Like, each 'edit' would be a branch, and it
> might be interesting to look at what the version control system
> authors know about their algebras and merging to come up with valuable
> output from low-barrier systems. The fact that we're really dealing
> with a huge amount of not so structured branches makes me favour the
> data mining idea, but then again, what do I know about the algebras
> that the distributed version control system folks have. And what do I
> know if I'd understand what they saying when they talked about it.
>
> Anyway, I think it's worthwhile to focus on how to gain output out of
> low-barrier systems and create measures of confidence for translated
> strings from them.

Exactly! There are multiple solutions to the problem of determining
quality from low-barrier-to-entry system, and we want to explore them
before jumping on the lock-everything-up wagon.

> I guess most of the folks that are actually hacking on the tools side
> will be at fosdem, so if this makes sense to you, you might want to
> spoil a beer or two with chatting about this. Sadly, I won't be able
> to join. I'll try next time.

Sure, hope to see you sometime in the future :)

This is also an invitation to my other GNOME fellows: I'd likely be
coming to FOSDEM (just went to the embassy today, in an unlikely event
that I don't get a visa, I would have to skip FOSDEM :), so you're
free to find me and we can discuss all things GNOME and all things
Launchpad.

Cheers,
Danilo


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]