Re: [Tracker] Syntax of searchings



Laurent Aguerreche wrote:
Le mercredi 15 novembre 2006 Ã 18:05 +0100, Javier Arantegui a Ãcrit :
Hi,

I have several documents where appear "Ziegler-Nichols". I can find
the documents searching for "ziegler-nichols" but I cannot find
anything if I look for "ziegler" or "nichols".  Is there any way to
find them looking for "Ziegler"?

I'm using the tracker-search-tool (0.5.1) and tracker (0.5.1)

Currently it is not possible and it is not normal... I do not know if
QDBM (which stores file names associated with keywords) can be set to
split string like "ziegler-nichols" into "ziegler" and "nichols"
automatically for searching or if we need to split strings ourselves.

we would need to do this as QDBM is just a hash table. Im not sure we should though?

underscores and hyphens are not treated as word breaks (it would be possible to do both - index the hyphenated term and its individual parts and I will look into this as we need to do this for filenames anyhow)


What I also dislike with libstemmer (which aims to "reduce" strings to
radicals to ignore plural for instance) is that it does not ignore
accentuated characters, so if I have a file which contains "ÃlÃphant",
then "Ãlephant" or "elephant" will not be found. "ÃlÃphant" is the
correct orthography but it happens very often that french people miss
some accents or add superflus ones... and it is the same problem in
other languages.


I am surprised that is happening because we normalize all utf8 strings before stemming (stemming can be turned off or can be set to french by setting the language code to "fr" in the config file but I am not sure if it is the stemmer)

obviously misspelt words or incorrect accents will be problematic but not sure how to get around that?


--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]