Re: [Tracker] Syntax of searchings

From: "Ulrik Mikaelsson" <ulrik mikaelsson gmail com>
To: "Laurent Aguerreche" <laurent aguerreche free fr>
Cc: tracker-list gnome org
Subject: Re: [Tracker] Syntax of searchings
Date: Wed, 15 Nov 2006 21:32:09 +0100

Currently it is not possible and it is not normal... I do not know if
QDBM (which stores file names associated with keywords) can be set to
split string like "ziegler-nichols" into "ziegler" and "nichols"
automatically for searching or if we need to split strings ourselves.

This question were raised before, regarding filenames with dashes and underscores in them. That time, the reply were that in C-code, dashes and underscores often have a meaning. I think we'll need to be context sensitive in this case, where regular documents and filenames usually require word-splitting, while sourcecode usually don't. (However, in c, the string "difference=alpha-beta" actually have three interesting lexemes and a dash should neither here create the lexeme "alpha-beta".)

What I also dislike with libstemmer (which aims to "reduce" strings to
radicals to ignore plural for instance) is that it does not ignore
accentuated characters, so if I have a file which contains "ÃlÃphant",
then "Ãlephant" or "elephant" will not be found. "ÃlÃphant" is the
correct orthography but it happens very often that french people miss
some accents or add superflus ones... and it is the same problem in
other languages.

Unfortunately, this is not always applicable. For instance in Swedish, there's a big difference in the words "Ãst" and "ost", where the meanings is "east" and "cheese", respectively. However, "cafÃ" is often spelled "cafe", with the same meaning.

I'm not sure at all how to handle this.

Follow-Ups:
- Re: [Tracker] Syntax of searchings
  - From: Laurent Aguerreche

References:
- [Tracker] Syntax of searchings
  - From: Javier Arantegui
- Re: [Tracker] Syntax of searchings
  - From: Laurent Aguerreche

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]