Re: Interesting Post on Tracker

On Thu, Oct 12, 2006 at 02:13:18AM +0000, Kevin Kubasik wrote:
> On 10/11/06, Joe Shaw <joeshaw novell com> wrote:
> >         * The only stemmer provided is English.  The stemmer uses the
> >         same well-known Porter stemming algorithm that is already used
> >         inside Lucene.  Also, the license of the snowball stemmer
> >         appears to be old-style BSD so it would be incompatible with GPL
> >         applications.
> Uh-Oh....

Hi, I'm one of the (two) maintainers and authors of the Snowball system.
I've got nothing to do with Tracker though (other than following its
development in the same way that I've been following Beagle's

The license of Snowball isn't intended to be old-style BSD, it's new style.
I've just taken a quick look at our site to check, and I'd agree that it
might be a good idea for us to explicitly put the license text on there
(currently, we just link to the page on for the license
text), but I can't see where you'd get the idea that it's the old-style BSD
license.  Could you point out where you got that idea from, so I can fix

If there is a problem, anyway, we hold Copyright for all the code, so can
change the license if necessary.

While I'm here, a quick plug for Snowball: we don't just offer an English
stemmer, we offer stemmers for Danish, Dutch, English, Finnish, French,
German, Hungarian, Italian, Norwegian, Portuguese, Russian, Spanish,
Swedish.  Oh, and Romanian is in the pipeline.  If Beagle wants to take
advantage of these stemmers, I'd be happy to offer help and advice.  In
particular, if Beagle requires a version of the algorithms in C#, the
Snowball code generator could be modified to generate a C# version without
too much difficulty (it already generates Java and C).  Alternatively, a C#
interface to the C stemmers could be built.  I don't speak C# well
currently, though, so I'd need help.

Also, I should note that the Snowball English stemmer is not the same as
the Porter stemmer - rather, it is an updated version of the Porter
stemmer, with a few rules modified to produce more useful results in
specific common cases.  For example, the Porter stemmer will stem "news" to
"new", whereas the Snowball English stemmer will leave "news" as "news".

Anyway, I'll go back to lurking now, unless anyone wants help using
Snowball. ;-)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]