Re: Current TODO


My belated response:

On Wed, 2006-02-15 at 21:14 -0500, Kevin Kubasik wrote:
> I was wondering, do we have a vauge standing TODO consolidated? There
> are snippets floating around the wiki, and random smatterings of TODO
> comments in the code, but no real centralized goal. What is our
> current focus? Is it bugfixes? Code cleanup? The new interface seems
> to making substantial progress, but what are our plans? Should we
> perhaps get a rough roadmap worked out? It seems that the open beagle
> API has encouraged enough interesting development to take place and
> provide nice application integration. While extracting more contextual
> info from files and making that data readily available is obviously a
> standing task, I guess I was just wondering where the current push is?

My focus right now, as an employee of Novell, is to get Beagle as ready
as possible for the NLD 10 release.  Primarily this means fixing bugs as
they're reported; we've been doing betas for a little while now so bugs
are coming in and that's what I've been focusing most of my time on.

When I'm not doing that, I am trying to do various code cleanups and
reviewing patches, fixing upstream bugs, etc.

> If were focusing more on internals cleanup and the like (as opposed to
> any new/major features) might  I suggest we work with optimizations
> for quick/live searching? Something like multiple indexes optimized
> for different roots, with database query decisions happening in beagle
> as opposed to lucene.

I have some ideas for doing quick searches by accessing the indexes
directly on disk rather than going through the daemon.  It'll only work
in C# and it won't support things like live queries, but it makes sense
for doing things like inline autocompletion.

Another optimization I am considering is moving from having one Lucene
index per backend to having a fixed number -- say, 8 -- and evenly
distributing into those indexes by hashing the URI.  The main reason is
that people's data tends to be heavily skewed toward some backends.

I am a bit of a pathological example, but I have roughly 300k emails,
150k files, and 5k gaim logs.  Beyond that I have fewer than 100 items
and in many backends (mainly the KDE ones) zero items.  In searches
which match against both gaim logs and emails, the gaim logs will be
returned much faster.  By evenly distributing the documents across
indexes, searches in general will be faster.  (Although it is possible
with some searches that the first hits returned will be slower, I think
it's a net win.)

These are both pretty big changes that will require some time to really
sit down and bang them out.  The former can probably be done without
intruding too much on the code base and can be done gradually, but the
latter definitely changes things in a big way.  Really, I need until
after NLD ships to look into it.

Beyond that, I think that a reliable network service would be nice; we
need better integration with Evolution so we can do things like snippets
for emails; we need to figure out what we want to do with metadata
generally; and I would really, *really* like to get back to doing
Dashboard development and some other data relationship tools.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]