Re: Spelling suggestions
- From: Debajyoti Bera <dbera web gmail com>
- To: Lukas Lipka <lukaslipka gmail com>
- Cc: Joe Shaw <joe joeshaw org>, dashboard-hackers <dashboard-hackers gnome org>
- Subject: Re: Spelling suggestions
- Date: Sun, 16 Dec 2007 08:45:39 -0500
> * Lucene only stores stemmed forms of the words (beagle becomes beagl)
>
> We have to figure out a way to unstem the word:
> 1.) Hack the analyzer to get the unstemmed word
> 2.) Traverse through our TextCache and find a word which
> which contains the stem part.
> This is what I'll be looking into today/tomorrow.
You might want to check the Highlighter.net package (in Lucene.Net/contrib
from their website). They highlight matched words. They use StandardAnalyzer
in their example but I wrapped a PorterStemmer around it and asked it to
highlight words with same stem and it was able to do it.
One way I had in mind was to create a tokenstream, check if the tokentext is
the same as the suggested stem, if yes use the token.startoffset,
token.endoffset to extract the actual text. Of course its easier said than
done ;-)
> We need to only return the highest relevant suggestions, based on:
> 1.) Term frequency in index
> 2.) Levenshtein distance score
Add to that there could be multiple indexes so results from multiple indexes
need to be intelligently merged.
> Sorry, for the exhausting email and lets make Beagle rock! :-)
Yayyyyyyyyyyyy !!!
- dBera
--
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]