Re: Current TODO



Hi,

On Wed, 2006-02-15 at 23:05 -0500, Kevin Kubasik wrote:
> I know that Spotlight accomplishes a large chunk of this through
> distributed indexes across the disk, and searching them based on
> modification time (more recently modified files searched first), but
> thats based on an old forum posts,

We also do this, although our indexes are broken up based on backend
rather than modification time.  I don't think mtime really buys you that
much... just more sanely breaking up the indexes would help for us (like
I mentioned in my last email) I think.

> This one is quite good, a large part of the speed is based on
> searching a simple metadata store and returning those results before
> handling the fulltext index.

We already do this as well, mostly.  A lot of metadata is kept in a
separate index from the full text and is searched along with the full
index.  The main problem here is that for AND queries (which we do by
default) you really need to know the results from the full text index to
take the intersection or you will get a lot of false positive matches.  

We're already using the lowest-level API Lucene has to offer for this,
so our matching is just about as fast as we can get it.  The bulk of the
time at this point is extracting data from the matches and sending them
over the wire.  Maybe we could have a "fast path" which extracts and
returns only URIs which clients would have to roundtrip to get more data
(this is how we do snippets), but we'd need to try it to be sure.

Joe




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]