Re: [Tracker] [Evolution-hackers] [Evolution] Beagle and Tracker, letting Evolution feed those beasts RDF triples instead



Hi Philip,

On Wed, 2008-12-10 at 12:49 +0100, Philip Van Hoof wrote:
    What does the lifecycle for the data in that Unset store look like ?

I think the LifeCycle is best described by this document:

http://live.gnome.org/MetadataOnRemovableDevices

It specifies a metadata cache format for removable devices in Turtle
format. 

        Not read that before; I just read it - and, as you say here is how
things are removed:

For your information when reading the document: The removal of a
resource as a special notation using blank resources <> <>, and the
removal of a predicate (of a field of a resource) uses the notation
<pfx:predicate> <>.

        Sure - so, that is fine - it's a representational detail of how
removals are stored. My concern is not that we can't represent removals
well - but that the life-cycle of that removal information is undefined.

        Say eg. we install beagle, and tracker - but we never run beagle. Then
we have two parties that have registered an interest in changes. If we
run beagle only every year or so - we need to know all mails that were
deleted since a year ago. Unfortunately, perhaps we never run it again.
Does that mean we endlessly accumulate in some monster journal a huge
list of 'UnSets' ?

  For a cache it's important to know the "modified" timestamp so that
  you know whether your copy of the metadata is most recent, or the
  cache is about the resource is most recent.

        Sure - I buy the timestamp thing; that's all great.

- When a resource got deleted then the RDF store wants to know about
  this as soon as possible. Asynchronously (like if the RDF store,
  being a subscriber, joins the subscription after the deletion took
  place) this also counts: as soon as possible. Preferably immediately
  after the subscription.

        Sure - so my problem is the life-cycle of the store of deletion
information: how long do we grow that list for, if people eg. turn off
the search client after finding it chews more resource than they had
hoped on their small machine :-)

  With IMAP there's a trick that you can do: you can assume that a hole
  in the UIDSET meant that some sort of deleting occurred.

        Sounds interesting.

I think, anyway, that it would make sense for Evolution to start doing
two things in the CamelDB:

        Agreed.

  * Log all deletions (just the UID should suffice), if the service
    reuses UIDs then upon effective reuse of the UID, this log's UID
    deletion should be removed from the log. Else you loose the E-mail
    at whoever depends on this log for knowing about effective
    deletions.

        So there is at least some bound to the growth of the deleted UUID
log ;-) which is the size / likelyhood of re-use in the UUID space.

        It's hard to think of solutions that are that satisfying; but - perhaps
something like cropping the deletion log-size at a percentage of stored
mail size, with some "log overflow" type message to flag that; or having
some arbitrary size bound on it, or more carefully disabling logging
when search services are disabled, or ... having only a single client,
or warning the user that they should run their search service some more,
or perhaps even coupling the indexing piece more closely to the mailer
itself somehow.

        HTH,

                Michael.

-- 
 michael meeks novell com  <><, Pseudo Engineer, itinerant idiot




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]