Re: [Rhythmbox-devel] Rhythmbox Searches For "AND" vs "&"



On Sat, Aug 14, 2010 at 4:08 PM, Stuart Langridge
<stuart langridge canonical com> wrote:
> Well, the only thing I can find in Ubuntu is liblingua-stopwords-perl, a
> Perl package. However, the list of stopwords isn't large (English ones
> from that Perl package at
> http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/maverick/liblingua-stopwords-perl/maverick/annotate/head:/lib/Lingua/StopWords/EN.pm
> and the other languages are there too), and I suspect that the list of
> ordinary search stopwords wouldn't necessarily apply to music searching
> anyway ("You're", for example, is a common English search stopword,
> but you'd probably want to include it in a music search), so I think it'd be
> reasonable to build and include a Rhythmbox-specific stopword list in
> Rhythmbox rather than depending on something provided in distros. (The
> list could obviously be inspired by existing stopword lists in packages or
> on the web, though.)

I've been looking through the stop words listed in the example you've
provided and only find a few of them useful.

Of the ones listed:
i me my myself we our ours ourselves you your yours yourself
yourselves he him his himself she her hers herself it its
itself they them their theirs themselves what which who whom
this that these those am is are was were be been being have has
had having do does did doing would should could ought i'm
you're he's she's it's we're they're i've you've we've they've
i'd you'd he'd she'd we'd they'd i'll you'll he'll she'll we'll
they'll isn't aren't wasn't weren't hasn't haven't hadn't
doesn't don't didn't won't wouldn't shan't shouldn't can't
cannot couldn't mustn't let's that's who's what's here's
there's when's where's why's how's a an the and but if or
because as until while of at by for with about against between
into through during before after above below to from up down in
out on off over under again further then once here there when
where why how all any both each few more most other some such
no nor not only own same so than too very

..the only ones I find useful are:
it its is are was be i'm you're he's she's it's we're they're
i've you've we've they've i'd you'd he'd she'd we'd they'd
i'll you'll he'll she'll we'll they'll isn't aren't wasn't weren't
hasn't haven't hadn't doesn't don't didn't won't wouldn't
shan't shouldn't can't cannot couldn't mustn't let's that's
who's what's here's there's when's where's why's how's
a an the and but if or as of at by for into to in out so too

I may have missed a few important ones or included some that are not
so important, but I think even the smaller list is too much. For most
of the contractions, we could probably strip the commas from both the
search words and the song details so that quick searches like "im
the...." or "they wont.." will match songs such as "I'm The..." and
"They Won't...".

Stripping the commas would obviously give matching false positives,
but I think the end benefits might be greater. This would also allow
us to dramatically reduce the side of the Stop Words lists... making
maintenance easier as well.

Also, are there any suggestions for a way to get community
translations/input so this feature can be supported in all available
languages? Such as a wiki that could be edited or something similar?

If we were to strip commas, the stop word list would be reduced to
something more like the following:
it its is are was be a an the and but if or as of at by for
into to in out so too

I've not had time lately to look into the coding that would be
involved in this, but I should be able to either this week or next.
I'd like to put together some sort of plan for getting this done. I
think my first priority will be implementing the code to drop words
successfully. The next step should be fine-tuning the range of the
stop words. Then finally, adding stop word support for other available
languages.

This shouldn't be a difficult patch, but a fine-tuned list may take a
while and I'd like to get as much input on words to include and
exclude as possible. Thanks guys.

Cheers.

-- 
Kyle Baker


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]