Re: [Rhythmbox-devel] discussion question
- From: Bastien Nocera <hadess hadess net>
- To: in7y118 public uni-hamburg de
- Cc: Luis Villa <louie ximian com>,Rhythmbox Devel <rhythmbox-devel gnome org>
- Subject: Re: [Rhythmbox-devel] discussion question
- Date: 08 Aug 2003 17:26:13 +0100
On Fri, 2003-08-08 at 16:47, in7y118@public.uni-hamburg.de wrote:
> Instead of replying to every post one by one, I'll do it in a batch. So here it
> goes...
>
> You might want to collect similar names into the same group. You'll get quite a
> bit of problems when defining "similar" though. Do you just use strings that
> are the same? Do you only do uppercase/lowercase? Is "R.E.M." similar to "REM"?
> Is "Britney Spears" the same as "Spears, Britney" or even "Spears Britney"?
> I spent quite some time on reading about/implementing fuzzy matching
> algorithms, I even sent one to this list once.
> The biggest problem with this is that this has to be i18n-safe. So all your
> optimizations must work for people in China or whatever, too. And they probably
> think different about some English-specific optimizations. And I waouldn't want
> to get into doing language-specific stuff.
> What I did however was rely on using Unicode-spcific character information to
> simplify a name by stripping/changing characters. This allows for example to
> remove signs ("R.E.M." => "REM") or make everything uppercase (when there is a
> corresponding uppercase character - German "ß" doesn't have one). The advantage
> is that all of this works within glib, so you don't have to put information
> into the algorithms.
> So my advice would be: Use the most sophisticated algorithms that are possible
> with the information you get, but don't put more information into the lib. So
> leave out rules for pattern matching (like "$X and $Y" == "$X & $Y" or "$Y, $X"
> == "$X $Y"). And be sure to use stuff that's not language-specific.
>
> You get very very far with that. All my searches worked satisfactory, even when
> I wrote stuff as wrong as I could imagine.
My take on this:
- Hard-code a bunch of easy ones with no false positives.
- Let people add new ones for different languages (Die Toten Hosen ==
Toten Hosen, Die == Toten Hosen ...)
--
/Bastien Nocera
http://hadess.net
#2 0x4205a2cc in printf ("Oh my %s\n", preferred_deity) from
/lib/i686/libc.so.6 printf ("Oh my %s\n", preferred_deity);
Segmentation fault
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]