Re: [Rhythmbox-devel] discussion question

Instead of replying to every post one by one, I'll do it in a batch. So here it 

You might want to collect similar names into the same group. You'll get quite a 
bit of problems when defining "similar" though. Do you just use strings that 
are the same? Do you only do uppercase/lowercase? Is "R.E.M." similar to "REM"?
Is "Britney Spears" the same as "Spears, Britney" or even "Spears Britney"?
I spent quite some time on reading about/implementing fuzzy matching 
algorithms, I even sent one to this list once.
The biggest problem with this is that this has to be i18n-safe. So all your 
optimizations must work for people in China or whatever, too. And they probably 
think different about some English-specific optimizations. And I waouldn't want 
to get into doing language-specific stuff.
What I did however was rely on using Unicode-spcific character information to 
simplify a name by stripping/changing characters. This allows for example to 
remove signs ("R.E.M." => "REM") or make everything uppercase (when there is a 
corresponding uppercase character - German "" doesn't have one). The advantage 
is that all of this works within glib, so you don't have to put information 
into the algorithms.
So my advice would be: Use the most sophisticated algorithms that are possible 
with the information you get, but don't put more information into the lib. So 
leave out rules for pattern matching (like "$X and $Y" == "$X & $Y" or "$Y, $X" 
== "$X $Y"). And be sure to use stuff that's not language-specific.

You get very very far with that. All my searches worked satisfactory, even when 
I wrote stuff as wrong as I could imagine.


Quoting Luis Villa <>:

> Should creation of the artist/album list be case-insensitive? I have The
> Rough Guide [Various] and The Rough Guide [various] listed as separate
> artists right now. This is irritating to me, and my gut sense is that RB
> should just 'fix' this for me, but I'm not 100% sure about that. So I
> wanted to post it here for discussion before calling it a bug and
> putting it in the DB.
> Thoughts?
> Luis
> _______________________________________________
> rhythmbox-devel mailing list

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]