Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)

On Wed, May 5, 2010 at 20:28, Jamie McCracken
<jamie mccrack googlemail com> wrote:
On Wed, 2010-05-05 at 18:39 +0200, Aleksander Morgado wrote:
So, with this improvement considering ASCII-only words a special case,
libunistring really beats them all.

yeah libunistring looks like good stuff - I must check the source!

I still note you need to apply word filtering rules on words beginning
with numbers or symbols - Im sure thats easy to do?

Probably words starting with symbols other than underscore can be
avoided. BTW, Why underscore not?

we only allowed underscore as some function names start with underscore
in source files

And regarding filtering numbers, is this something we want to do?
There's a bugreport regarding this:

most numbers are junk - especially in source files and would bloat up
the index.

we used to have an option where if a number was longer than x characters
we would accept it (on the grounds it was a telephone number and
therefore actually useful - im not sure if this preference is still
available or used)

An interesting limitation of that is the convention of writing numbers
like this (012) 345 6789.

my place on the web:

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]