Re: [Tracker] [Evolution-hackers] [Evolution] Beagle and Tracker, letting Evolution feed those beasts RDF triples instead



On Wed, 2008-12-10 at 11:12 +0000, Michael Meeks wrote:
Hi Philip,

On Tue, 2008-12-09 at 19:59 +0100, Philip Van Hoof wrote:
http://live.gnome.org/Evolution/Metadata

For early visitors of that page, refresh because I have added/changed
quite a lot of it already.

      Looks really good.

      The only thing that I don't quite understand (the perennial problem
with asynchronous interfaces), is the memory issue: it seems we need to
store all Unset information on deleted mails somewhere [ unless you are
a womble like me that keeps ~all mail forever ;-].

      What does the lifecycle for the data in that Unset store look like ?
[ I assume that as/when you re-connect to the service you're as much
likely to get an UnsetMany as a SetMany ]. What if that data starts to
grow larger than the remaining data it describes ? ;-) [ depending on
how we do Junk mail filtering of course that might be quite a common
occurrence for some ].

I think the LifeCycle is best described by this document:

http://live.gnome.org/MetadataOnRemovableDevices

It specifies a metadata cache format for removable devices in Turtle
format. 

For your information when reading the document: The removal of a
resource as a special notation using blank resources <> <>, and the
removal of a predicate (of a field of a resource) uses the notation
<pfx:predicate> <>.

Although cached metadata on a removable device is not the exact same
use-case, the life-cycle of what the RDF store (or the metadata engine)
wants is the same:

- When a new resource is created or one of its predicates (one of its
  fields) is being updated, it just wants to know about these updates or
  creates. An update is the same as a create if the resource didn't
  exist before.

  For a cache it's important to know the "modified" timestamp so that
  you know whether your copy of the metadata is most recent, or the
  cache is about the resource is most recent.

  For Evolution (for E-mail clients) we can simplify this as "whenever a
  Set or a SetMany happens, we assume time() to be that date". That's
  because we can assume the E-mail client to have top-most priority in
  all cases (being the benevolent dictator about metadata about E-mails,
  it knows best what we should swallow and when we should swallow its
  updates - we should not make up our own minds and decisions about it)

- When a resource got deleted then the RDF store wants to know about
  this as soon as possible. Asynchronously (like if the RDF store,
  being a subscriber, joins the subscription after the deletion took
  place) this also counts: as soon as possible. Preferably immediately
  after the subscription.

  Right now I don't think Evolution is keeping state about deleted UIDs

  With IMAP there's a trick that you can do: you can assume that a hole
  in the UIDSET meant that some sort of deleting occurred. That's
  because IMAP is ~ specified that the server can't reuse UIDs (some
  IMAP servers might not respect this, and those are also broken in
  Evolution afaik - or at least require a workaround that makes
  Evolution basically perform like a POP client for IMAP when
  synchronizing -)

  With POP I don't think you can make any such assumptions.

- Removing the predicate from a resource (the field of a resource) ain't
  needed for E-mail. Luckily E-mail is a mostly read-only storage. With
  exception of fields like <nmo:isRead>. Maybe if we want to support
  removing a flag or a custom-flag at some point we might need to add
  something to the API to indicate the removal of a field of a resource.

  For example it's not possible that the CC or the TO list of an E-mail
  changes. Because E-mails, once stored, are read-only in that aspect.


I think, anyway, that it would make sense for Evolution to start doing
two things in the CamelDB:

  * Log all deletions (just the UID should suffice), if the service
    reuses UIDs then upon effective reuse of the UID, this log's UID
    deletion should be removed from the log. Else you loose the E-mail
    at whoever depends on this log for knowing about effective
    deletions.

 * Record the timestamp for each record in the summary table. This
   timestamp would store the time() when the record got added and maybe
   would also store the time() (preferably separately) when the last
   time the E-mail's flags got changed was.

With those two additions to the schema of the CamelDB it would I think
be possible to make a plugin that implements the service as proposed on
the wiki page.

Matthew Barnes replied on IRC that we should start storing those
timestamps anyhow. I also think it's a good idea. I was planning to
discuss this with psankar and srag too.

If we'd change the schema then we will also need to implement a
migration path from the old schema to the new.

Using virtual tables you can simulate MySQL's ALTER TABLE in SQLite.

TRANSACTION 

SELECT * FROM orig_table INTO virtual_table;

DROP orig_table;

CREATE orig_table (
  ...
  created datetime,
  modified datetime
);

SELECT *, time(), time() FROM virtual_table INTO orig_table;

COMMIT


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]