Re: [Tracker] SQLite databases

From: Philip Van Hoof <spam pvanhoof be>
To: George Farris <farrisg cc viu ca>
Cc: tracker-list gnome org
Subject: Re: [Tracker] SQLite databases
Date: Fri, 21 Jan 2011 14:20:14 +0100

On Thu, 2011-01-20 at 11:15 -0800, George Farris wrote:

On Thu, 2011-01-20 at 18:13 +0000, Martyn Russell wrote:

On 20/01/11 16:20, George Farris wrote:


Hi George,

Is there some reason tracker is setup to talk to Evolution for indexing
email?  Email indexing hasn't really worked well since the days before
Evo used SQLite?

I would really love to have my email indexed and not just the subject
but the entire thing.


We can do the entire thing, but that's really close to cloning your 
email and that decision splits the team currently. It also is quite a 
performance drain on Evolution and your machine to index all the content 
too. Philip has the details :)

Is there anything that can be done here out of curiosity?>


Do you have any evidence to suggest that SQLite is not good enough, if 
so, do you have a proposed replacement?


No I think SQLite is probably fine. In the past before Evo changed to
SQLite both Beagle, and I think Tracker, could index email and one could
search for content inside the email, which of course is exactly what one
wants to do.

I'm just trying to find a way back to what we had 3 years ago.  It's one
of those crucial things for an enterprise desktop function.



Back then both Beagle and Tracker out of process of Evolution opened the
so called "summary" files that Camel wrote. They read these without
first locking the files, but usually this worked fine for read-only.

Nonetheless, multi-process non-locked access to the same file: not nice.

Ever since Evolution switched to SQLite this isn't easily possible
anymore:

        Evolution uses very large transactions and while it holds this
        transaction other processes connecting to the same SQLite
        database are locked out.

If Evolution would start using WAL journaling it would be possible for
other processes to create a second read-only connection on their now
called summary.db database files. Tracker's tracker-store also does this
and that's what enables libtracker-sparql's direct-access mode.

Meanwhile we made a plugin for Evolution that runs in Evolution's
process and that communicates to us all the metadata from Evolution's
process to tracker-store. But indeed not the body data of E-mails, nor
the BODYSTRUCTURE tree info of E-mails (which includes attachment's
metadata).

We think generally that this is a better architecture than having
multiple processes connect to Evolution's SQLite databases.

Also because Evolution does changes to E-mails (like flag changes) that
would be hard to trigger in real time to Tracker if we would read the
SQLite database using our own process.

The current search function in Evo doesn't work very well and in fact
crashes Evo from time to time.


Those crashes should be fixed in recent 0.9 releases. In 0.8 the crashes
are a known problem but due to dbus-glib being the DBus library being
used there, it's hard to fix those problems (dbus-glib not being thread
safe and all that).

In master and more recent 0.9 releases has all this been replaced by use
of libtracker-sparql and GDBus (which are thread-safe(r)).

Evolution heavily uses threads, this makes it hard to avoid using
threads when developing plugins (that do something with the E-mails and
folders) for it.

I should add, we are not of the same opinion ;)


So then you have full email indexing? I've followed all the instructions
but to no avail and this is with the latest Evolution (Ubuntu 10.10) and
Tracker from both the repository and compiled from the tracker site.


There is commented-out code in the Evolution plugin that would do this.

What needs to be modified about this is running that code per E-mail in
a low priority queue.

Patches doing this and enabling the commented-out code are welcome once
you tested it very well in high-stress situations (like having a folder
with hundreds of thousands of E-mails with a lot of MIME parts).

I have adapted Evolution a year or so ago to also store BODYSTRUCTURE in
its summary.db databases for each E-mail. I can give you a BODYSTRUCTURE
parser (Evolution doesn't have an easily usable one) that you could use
to parse this in-db text field and then convert it to RDF and add it to
the metadata that Evolution passes to tracker-store over IPC.

If doing this BODYSTRUCTURE part (which isn't as heavy as the commented
out code that I mentioned earlier at all) you'd have 95% of all the
things you want (with exception of the actual text/plain content of the
E-mails).


Cheers,

Philip




_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list


-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

Follow-Ups:
- Re: [Tracker] SQLite databases
  - From: Philip Van Hoof

References:
- [Tracker] SQLite databases
  - From: George Farris
- Re: [Tracker] SQLite databases
  - From: Martyn Russell
- Re: [Tracker] SQLite databases
  - From: George Farris

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]