Re: Stemmed search configuration

how to decide the language of the data/metadata for each document.
Proberly not easy. But i gues word/oo documents must have this information embedded, How else can they select the right spellchecker. Maybe beagle filters should dump html (not text) - this would allow the the text to contain the html <lang> markup.

Beagle has the means to use a different stemmer for each document
Does this mean the stemmer are used while indexing data, and not while searching data?

For most documents, only some data/metadata fields are in a different language and the others are generally in English
I do not think this is a problem. My geas is most searches in my organition will be keywords from the contents of the documents, not the tecknical meta data. 98% of the search will be after danish words.

If you are using 0.3.x ... change ... DEFAULT_STEMMER = "Danish";
Sorry have not been able to compile (configure) 0.3.2 on my fedora FC6 (only 0.2.x)
I have installed:

But configuration fails:
configure: error: Package requirements (ndesk-dbus-glib-1.0 >= 0.3.0) were not met: Consider adjusting the PKG_CONFIG_PATH environment variable if you installed software in a non-standard prefix. Alternatively, you may set the environment variables NDESK_DBUS_CFLAGS and NDESK_DBUS_LIBS to avoid the need to call pkg-config.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]