Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
- From: Tshepang Lekhonkhobe <tshepang gmail com>
- To: jamie mccrack gmail com
- Cc: "Tracker \(devel\)" <tracker-list gnome org>
- Subject: Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
- Date: Thu, 6 May 2010 12:21:51 +0200
On Wed, May 5, 2010 at 20:28, Jamie McCracken
<jamie mccrack googlemail com> wrote:
On Wed, 2010-05-05 at 18:39 +0200, Aleksander Morgado wrote:
So, with this improvement considering ASCII-only words a special case,
libunistring really beats them all.
yeah libunistring looks like good stuff - I must check the source!
I still note you need to apply word filtering rules on words beginning
with numbers or symbols - Im sure thats easy to do?
Probably words starting with symbols other than underscore can be
avoided. BTW, Why underscore not?
we only allowed underscore as some function names start with underscore
in source files
And regarding filtering numbers, is this something we want to do?
There's a bugreport regarding this:
most numbers are junk - especially in source files and would bloat up
we used to have an option where if a number was longer than x characters
we would accept it (on the grounds it was a telephone number and
therefore actually useful - im not sure if this preference is still
available or used)
An interesting limitation of that is the convention of writing numbers
like this (012) 345 6789.
my place on the web:
] [Thread Prev