Re: [Tracker] [Evolution-hackers] [Evolution] Beagle and Tracker, letting Evolution feed those beasts RDF triples instead
- From: Philip Van Hoof <spam pvanhoof be>
- To: michael meeks novell com
- Cc: Sankar <psankar novell com>, Evolution Hackers <evolution-hackers gnome org>, Tracker mailing list <tracker-list gnome org>
- Subject: Re: [Tracker] [Evolution-hackers] [Evolution] Beagle and Tracker, letting Evolution feed those beasts RDF triples instead
- Date: Wed, 10 Dec 2008 12:49:53 +0100
On Wed, 2008-12-10 at 11:12 +0000, Michael Meeks wrote:
Hi Philip,
On Tue, 2008-12-09 at 19:59 +0100, Philip Van Hoof wrote:
http://live.gnome.org/Evolution/Metadata
For early visitors of that page, refresh because I have added/changed
quite a lot of it already.
Looks really good.
The only thing that I don't quite understand (the perennial problem
with asynchronous interfaces), is the memory issue: it seems we need to
store all Unset information on deleted mails somewhere [ unless you are
a womble like me that keeps ~all mail forever ;-].
What does the lifecycle for the data in that Unset store look like ?
[ I assume that as/when you re-connect to the service you're as much
likely to get an UnsetMany as a SetMany ]. What if that data starts to
grow larger than the remaining data it describes ? ;-) [ depending on
how we do Junk mail filtering of course that might be quite a common
occurrence for some ].
I think the LifeCycle is best described by this document:
http://live.gnome.org/MetadataOnRemovableDevices
It specifies a metadata cache format for removable devices in Turtle
format.
For your information when reading the document: The removal of a
resource as a special notation using blank resources <> <>, and the
removal of a predicate (of a field of a resource) uses the notation
<pfx:predicate> <>.
Although cached metadata on a removable device is not the exact same
use-case, the life-cycle of what the RDF store (or the metadata engine)
wants is the same:
- When a new resource is created or one of its predicates (one of its
fields) is being updated, it just wants to know about these updates or
creates. An update is the same as a create if the resource didn't
exist before.
For a cache it's important to know the "modified" timestamp so that
you know whether your copy of the metadata is most recent, or the
cache is about the resource is most recent.
For Evolution (for E-mail clients) we can simplify this as "whenever a
Set or a SetMany happens, we assume time() to be that date". That's
because we can assume the E-mail client to have top-most priority in
all cases (being the benevolent dictator about metadata about E-mails,
it knows best what we should swallow and when we should swallow its
updates - we should not make up our own minds and decisions about it)
- When a resource got deleted then the RDF store wants to know about
this as soon as possible. Asynchronously (like if the RDF store,
being a subscriber, joins the subscription after the deletion took
place) this also counts: as soon as possible. Preferably immediately
after the subscription.
Right now I don't think Evolution is keeping state about deleted UIDs
With IMAP there's a trick that you can do: you can assume that a hole
in the UIDSET meant that some sort of deleting occurred. That's
because IMAP is ~ specified that the server can't reuse UIDs (some
IMAP servers might not respect this, and those are also broken in
Evolution afaik - or at least require a workaround that makes
Evolution basically perform like a POP client for IMAP when
synchronizing -)
With POP I don't think you can make any such assumptions.
- Removing the predicate from a resource (the field of a resource) ain't
needed for E-mail. Luckily E-mail is a mostly read-only storage. With
exception of fields like <nmo:isRead>. Maybe if we want to support
removing a flag or a custom-flag at some point we might need to add
something to the API to indicate the removal of a field of a resource.
For example it's not possible that the CC or the TO list of an E-mail
changes. Because E-mails, once stored, are read-only in that aspect.
I think, anyway, that it would make sense for Evolution to start doing
two things in the CamelDB:
* Log all deletions (just the UID should suffice), if the service
reuses UIDs then upon effective reuse of the UID, this log's UID
deletion should be removed from the log. Else you loose the E-mail
at whoever depends on this log for knowing about effective
deletions.
* Record the timestamp for each record in the summary table. This
timestamp would store the time() when the record got added and maybe
would also store the time() (preferably separately) when the last
time the E-mail's flags got changed was.
With those two additions to the schema of the CamelDB it would I think
be possible to make a plugin that implements the service as proposed on
the wiki page.
Matthew Barnes replied on IRC that we should start storing those
timestamps anyhow. I also think it's a good idea. I was planning to
discuss this with psankar and srag too.
If we'd change the schema then we will also need to implement a
migration path from the old schema to the new.
Using virtual tables you can simulate MySQL's ALTER TABLE in SQLite.
TRANSACTION
SELECT * FROM orig_table INTO virtual_table;
DROP orig_table;
CREATE orig_table (
...
created datetime,
modified datetime
);
SELECT *, time(), time() FROM virtual_table INTO orig_table;
COMMIT
--
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be
gnome: pvanhoof at gnome dot org
http://pvanhoof.be/blog
http://codeminded.be
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]