Re: [Evolution-hackers] UID in vCard

From: Patrick Ohly <patrick ohly gmx de>
To: hilberg kernelconcepts de
Cc: evolution-hackers gnome org
Subject: Re: [Evolution-hackers] UID in vCard
Date: Wed, 16 Nov 2011 15:55:54 +0100
Hello!

Before I reply to Christian, let me elaborate a bit more on why
overwriting the UID is bad for syncing. I thought it was obvious, but
probably not ;-}

What I have in mind is a system where contact data moves freely and
ad-hoc between peers, without being forced to go through a central
server, stick to a fixed topology or use a specific database technology.
This cannot be done with today's technology because avoiding duplicates
becomes unreliable.

CouchDB comes close, but is a closed system. You also cannot create two
independent CouchDB instances and then merge them (at least that is my
understanding - I might be wrong on that particular aspect).

A creator-assigned global UID would be the right way of solving this.
Storages or systems which do create and preserve such a UID cannot
participate in that system. I'd like to make sure that the EDS file
backend can be part of it, even if it is just for experiments.

For more details, see the TODOs on "ad-hoc synchronization" in
http://syncevolution.org/blogs/pohly/2011/state-union-version-12

On Di, 2011-11-15 at 15:01 +0100, Christian Hilberg wrote:
> Am Dienstag 15 November 2011, um 11:03:24 schrieb Patrick Ohly:
> > On Di, 2011-11-15 at 10:50 +0100, Christian Hilberg wrote:
> > [...] 
> > > Just adding a few bits on how the situation is for the Kolab groupware server.
> > > 
> > > The evolution-kolab backend cannot ask the Kolab server for a UID (since there
> > > is no API for that) nor does the server enforce certain UIDs on a PIM object (but,
> > > of course, that there be one). The only requirement is that the PIM object's UID
> > > be unique in a given PIM folder. If the UID is globally unique, all the better.
> > 
> > That's the same situation as with the file backend then: a client could
> > decide to set a UID in the vCard before creating the contact, and the
> > Kolab backend+server would use that UID instead of creating their own.
> > Good :-)
> 
> If a new UID is to be created, it is the responsibility of the Kolab client to
> assign one. The Kolab server itself is unaware to object UIDs and will not touch
> them (no read/write/anything).

With "client", I meant an EDS client here (= the application using
libebook). That there is a Kolab client and server involved is of course
important for you, but not so much for a user of the abstract libebook
API ;-)

> > How does the backend work at the moment? Does it always overwrite an
> > existing UID like the file backend does or does it already work as I
> > proposed?
> 
> Existing UIDs are (and must be) preserved. This is a requirement for Kolab client
> interop, since they all rely on the object's UID to reference it (especially regarding
> changes to the contents of the PIM object). In Kolab, there is no way to
> correctly reference an object other than using its UID.
> 
> > If it does, do you throw a
> > E_BOOK_ERROR_CONTACT_ID_ALREADY_EXISTS when the existing UID is not
> > unique?
> 
> Eeewww. :-) evolution-kolab presently sits on Evo/EDS 2.30.3 (which some like
> to call "just plain old" here =). No error message in that case. If the UID
> already exists, it gets rewritten. It was a tradeoff here - an existing Kolab
> object and its UID superseedes a new one. Imported objects are regarded as
> "new" (being assigned a new UID), should the UID they carry already exist in
> the given PIM folder. The original UID would do no good in the Kolab context
> if another object with that same UID already exists, since other groupware
> clients do have an idea about the object this UID refers to already. Trying
> to find out whether the imported object could actually be an update for an
> already existing one seemed too complex and out-of-scope for the initial
> evolution-kolab implementation. We're now porting to current Evo/EDS git master,
> but I would still keep the current implementation unchanged when it comes
> to how to interpret UIDs of imported objects.

That is the whole point of this mail thread: a vCard UID may have a
meaning outside of the storage in which it currently exists. EDS cannot
know whether that is the case. Currently it assumes that the UID has no
meaning and throws it away when adding contacts.

Consider sending around vCard with globally unique UID. A user might
import that vCard once into Kolab using Kolab client A. Then when he
tries again with Kolab client B, the client can warn him reliably that
this particular contact was already imported.

> > > PIM objects already residing on the Kolab server do carry a UID, created by the
> > > client which created the object (evolution-kolab, Kontact, Horde, Outlook with
> > > a Kolab connector, Thunderbird via Lightning/SyncKolab, ...).
> > 
> > Do you attempt to make the UID globally unique, for example by using a
> > UUID?
> 
> In our current implementation, the UID will be unique to the Kolab server at hand.
> Since the Kolab server does not impose specific restrictions on the format of the
> UID, evolution-kolab could change the UID generation code (we currently use E-D-S
> infrastructure for this) to generate UUIDs. However, other Kolab clients are free
> to follow their very own scheme of UID formatting (they may well decide that UIDs
> unique for a given folder only are unique enough). I'm not clear whether the new
> Kolab format specification, which is in the makings right now, would enforce a
> UID to be globally unique. Older clients would not follow that scheme, so if you
> cannot *rely* on the UID of being globally unique, you do not gain anything. The
> Kolab philosophy is to offload almost _everything_ to the clients for maximum
> scalability and minimum server load, accepting the fact that the server cannot
> really enforce anything. The Kolab server itself is, as I said, fully unaware of
> the PIM data. It stores all PIM objects as emails in IMAP folders. Hence, it will
> happily accept a client writing multiple PIM emails onto the server and into one
> PIM folder, all carrying the same UID. It is really all in the hands of the clients.
> If a new Kolab server will not enforce UIDs to be UUIDs (and it very certainly won't),
> then your gain is zero if you implement UUIDs in one client only.

True. Because vCard UIDs are not guaranteed to be globally unique in the
general case, they cannot be relied upon completely. But even in today's
world they provide a good hint already. I've seen pretty long UIDs in
the wild. Evolution is a bit on the short side here.

> > > When it comes to importing a PIM object, it is not possible to retain its
> > > UID in the cases where the same UID exists on the server already for another PIM
> > > object (unlikely, but possible, since Kolab object UIDs are not required to be
> > > globally unique). As long as we're in offline mode, we may at first succeed retaining
> > > the object's UID, but when going online any syncing with the server, find that
> > > a new UID must be set on the object.
> > 
> > What happens during syncing? Do you resolve the add<->add conflict by
> > duplicating the item, merging them or discarding one copy?
> 
> This is a configuration option the user has. Kontact, as a reference client for Kolab,
> will ask you in all events of synchronization conflicts. In Evo/EDS 2.30, we did not
> have the infrastructure needed to query Kolab-specific user input from Evo, so the whole
> thing is non-interactive. For each PIM folder, you can configure the backends to use
> one of the following strategies:
> * use the "newer" PIM object (relies on timestamps - since these are set by the clients,
>   not the Kolab server, it only works if the client's clocks are synced)
> * use the client's object (overwrites the one on the server)
> * use the server's object (discards the client's changes)
> * create a duplicate
> These strategies apply if an object is alredy known to at least 2 clients and
> is changed by both at the same time. If with "add<->add conflict" you mean that two clients
> are adding *new* PIM objects, and both mean the same (say, two people adding an object for
> the same event in some shared calendar folder), then there is no automatism to resolve this.
> The result is, that two PIM objects have been added. That's it. Any automatism to guess whether
> the two clients *really* *really* mean the same thing (and possibly merge the contents of the
> two objects into one) would be horribly complex and therefore would most probably fail to do
> The Right Thing exactly because of that.

The Synthesis engine has some pretty good support for exactly this
problem. For the most common scenario (exact same iCalendar invitation
or vCard imported twice) it can detect the content is the same and avoid
duplicating the items. If less sure, it could fall back to duplicating
the items.

> If people find they have both added a PIM object for
> the same thing into a shared folder, let them get into conversation, clarify things and remove
> one of the objects after achieving consensus.

If such conflicts are rare, that's a good and pragmatic solution. But in
the ad-hoc system mentioned above it could be very common (device A has
all data, sends to B and C, then B and C sync => all items exist twice).

-- 
Bye, Patrick Ohly
--  
Patrick Ohly gmx de
http://www.estamos.de/
Follow-Ups:
- Re: [Evolution-hackers] UID in vCard
  - From: Rodrigo Moya
- Re: [Evolution-hackers] UID in vCard
  - From: Christian Hilberg
References:
- [Evolution-hackers] UID in vCard
  - From: Patrick Ohly
- Re: [Evolution-hackers] UID in vCard
  - From: Christian Hilberg
- Re: [Evolution-hackers] UID in vCard
  - From: Patrick Ohly
- Re: [Evolution-hackers] UID in vCard
  - From: Christian Hilberg
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]