Re: [Tracker] [Evolution] Beagle and Tracker, letting Evolution feed those beasts RDF triples instead



On Tue, 2008-12-09 at 18:00 +0530, Sankar wrote:

Hey Sankar,

I'm writing a plugin that will implement the "Manager" class as
described here. Tracker will then implement being a "Registrar".

http://live.gnome.org/Evolution/Metadata

I will be using camel-db.h as you hinted me on IRC to implement the
features in a well performing way (direct SQLite access).

I will start working on this plugin tomorrow or next week. At the same
time I will be implementing support for it in Tracker, which will serve
as a prototype for other metadata engines.

I hope to inspire people from the Evolution team, and from the different
metadata engines, to comment on the proposed D-Bus API.

Let's try to get this valuable metadata out of those damn E-mail clients
and let's try to get it right this time. Not ad-hoc, but right.

If the namespace should be translated from org.gnome to org.freedesktop
we can of course do this afterwards. The "metadata.Manager" part would
also have to be renamed to a better name. But in the end implementation
would not be affected a lot by such renames. Meanwhile we can prototype
it in GNOME's Evolution D-Bus namespace.

The reason for all this prototyping is that we wouldn't like to release
a Tracker that doesn't support Evolution's new summary format.

This time we're into getting it right so just hacking around the new
summary format by fixing something that wrongfully interpreted
Evolution's cache by itself instead of letting Evolution tell us about
it ... 

* Well we could do this, but really ... let's just get it right now that
  I can spend time on this. At least that's my point of view on this.

* Other apps trying to read Evolution's caches externally just isn't
  ever going to be generic for all E-mail clients, and is not really
  right. For example file locking and (now that it's SQLite based)
  caring about transactions being held by Evolution and all that stuff.
  Caring about the possibility of Evolution changing the database
  schema.

* It's just not very nice to do it that way in my opinion: It adds a
  unasked for burden on the Evolution team too: having to negotiate with
  us when you want to change the schema of the database. Else you will
  break a lot of people's desktops unannounced. Evolution would need to
  make a mechanism for us to tell us about the version of the schema,
  for example. And we would have to implement things in Tracker that
  deal with all versions of Evolution's cache versions.

  One big spaghetti mess distributed over multiple projects.

  So, let's just do it right 


On Mon, 2008-12-08 at 18:59 +0100, Philip Van Hoof wrote:
All metadata engines are nowadays working on a method to let them get
their metadata fed by external applications.
Such APIs come down to storing RDF triples. A RDF triple comes down to a
URI, a property and a value.

For example (in Turtle format, which is SparQL's inline format and the
typical w3's RDF storage format):
We'd like to make an Evolution plugin that does this for Tracker. 

Obviously would it be as easy as letting software like Beagle become an
implementer of "prox"'s InsertRDFTriples to start supporting Beagle with
the same code and Evolution plugin, this way.

I just don't know which EPlugin hooks I should use. Iterating all
accounts and foreach account all folders and foreach folder all
CamelMessageInfo instances is trivial and I know how to do this.

What I don't know is what reliable hooks are for:

  * Application started

org.gnome.evolution.shell.events:1.0 - es-event.c - 

sample plugin:
groupwise-account-setup/org-gnome-gw-account-setup.eplug.xml 


  * Account added

org.gnome.evolution.mail.config:1.0 

sample plugin:
groupwise-account-setup/org-gnome-gw-account-setup.eplug.xml 

For account-added: id = org.gnome.evolution.mail.config.accountDruid
For account-edited: id = org.gnome.evolution.mail.config.accountEditor

  * Account removed

You may have to write a new hook

  * Folder created
  * Folder deleted
  * Folder moved
  * Message deleted (expunged)
  * Message flagged for removal 
  * Message flagged as Read and as Unread
  * Message flagged (generic)
  * Message moved (ie. deleted + created)
  * New message received
    * Full message 
    * Just the ENVELOPE


If you try to update your metadata for every of the above operations, it
may be a overkill in terms of performance (and I believe more disk
access as well for updating your metadata store). You can add a new hook
while any change is made to the summary DB and listen to that. All the
above changes will have to eventually come to summary DB for them to be
valid.


However, I personally believe:

More and more applications are using sqlite (firefox and evolution my
two most used apps.)  So, it may be a better idea to directly map the
tables in an sqlite database into the search applications' data-store
(beagle, tracker etc.) instead of depending on the applications to give
the up-to-date data. 

When we implemented on-disk-summary for evolution summaries, we removed
the meta-summary code (used by beagle). We had to provide a way for
helping Beagle / Tracker to know of modified/new mails, so they could
(re)index these mails. Some suggested that we should add a DATETIME
field which contains the time-stamp of the time last modified/created
for each record. However, this in addition to bloating the database,
also does not provide any information about deleted records.

If, inside the sqlite db, if we have a special table comprising:
table-name,primary keys of records of last N records
modified/added,time-added; Any search application can make use of this
and update its lucene (or whatever) data ,  

It may not be the neatest approach, but what I want to say is : Instead
of depending on the enduser applications (which use sqlite) for giving
data, search applications, should be able to get the data from the db
itself. This also provides additional benefits like creating/updating
search indices when the machine is idle, instead of choking the
applications when they are running, etc.

My 0.2 EUROes ;-)

--
Sankar


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]