Re: Stemmed search configuration
- From: Debajyoti Bera <dbera web gmail com>
- To: dashboard-hackers gnome org
- Subject: Re: Stemmed search configuration
- Date: Sat, 22 Mar 2008 09:28:43 -0400
> Is it posible to configure the stemmed search feture to other languages
> than english (e.g. danish)?
Not yet. The main problem seems to be how to decide the language of the
data/metadata for each document. Only very few data sources (some html files,
emails probably) specify the language of the data.
Beagle has the means to use a different stemmer for each document but not with
different metadata of a document. For most documents, only some data/metadata
fields are in a different language and the others are generally in English.
It will be hard to get it right everytime, so currently we just default to
English.
If you are using 0.3.x and you are willing to modify the source then change in
beagled/LuceneCommon.cs:
DEFAULT_STEMMER = "English";
to
DEFAULT_STEMMER = "Danish";
Beware that this will use the Danish stemmer for "every" data/metadata
indexed.
- dBera
--
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE / Mandriva / Inspiron-1100
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]