Re: [Evolution-hackers] UID in vCard



Hi Patrick,

Am Mittwoch 16 November 2011, um 15:55:54 schrieb Patrick Ohly:
> Hello!
> [...]
> On Di, 2011-11-15 at 15:01 +0100, Christian Hilberg wrote:
> > If a new UID is to be created, it is the responsibility of the Kolab client to
> > assign one. The Kolab server itself is unaware to object UIDs and will not touch
> > them (no read/write/anything).
> With "client", I meant an EDS client here (= the application using
> libebook). That there is a Kolab client and server involved is of course
> important for you, but not so much for a user of the abstract libebook
> API ;-)

While the E-D-S client (like Evo) might not be interested whether it is
a Kolab backend being used, there is still one thing you may wish to consider here.
We could of course map between Kolab PIM object UIDs and E-D-S UUIDs in our
backend. The E-D-S client (like Evo) would then see UUIDs and be happy with it.
Now imagine someone in Evo exports a PIM object which originates from a Kolab
server. Evo would write the E-D-S UUID into that PIM object. Now, by some nice
round-trip, this exported PIM object would reach a user of the Kolab server
it originated from. By importing the object into the Kolab server, we would then
generate a dupe, since the Kolab UID for a PIM object *is* the iCal/vCard UID
of an object stored. While this would have the potential of being a self-healing
process over time (a PIM object duplication is detected, the one with the UID which
is not UUID is deleted), it would take a very long time for the healing to be
completed, and only then you could really rely on the assents a UUID makes.
  To give you a hint about the numbers we're talking about, it is not uncommon for
a Kolab deployment to host thousands of contacts and incident types, potentially
shared among hundreds of users with various clients.
  Implementing UUIDs in our backend, as I said, is not an issue. The issue is
more like "what to you gain if you cannot rely on the UID you see to be really
globally unique". In Kolab context (that's why I'm talking about it here), a mapping
between Kolab-UID and E-D-S-UUID would not help you in PIM data exchange and sync
interplay.

> > > How does the backend work at the moment? Does it always overwrite an
> > > existing UID like the file backend does or does it already work as I
> > > proposed?
> > 
> > Existing UIDs are (and must be) preserved. This is a requirement for Kolab client
> > interop, since they all rely on the object's UID to reference it (especially regarding
> > changes to the contents of the PIM object). In Kolab, there is no way to
> > correctly reference an object other than using its UID.
> > 
> > > If it does, do you throw a
> > > E_BOOK_ERROR_CONTACT_ID_ALREADY_EXISTS when the existing UID is not
> > > unique?
> > 
> > Eeewww. :-) evolution-kolab presently sits on Evo/EDS 2.30.3 (which some like
> > to call "just plain old" here =). No error message in that case. If the UID
> > already exists, it gets rewritten. It was a tradeoff here - an existing Kolab
> > object and its UID superseedes a new one. Imported objects are regarded as
> > "new" (being assigned a new UID), should the UID they carry already exist in
> > the given PIM folder. The original UID would do no good in the Kolab context
> > if another object with that same UID already exists, since other groupware
> > clients do have an idea about the object this UID refers to already. Trying
> > to find out whether the imported object could actually be an update for an
> > already existing one seemed too complex and out-of-scope for the initial
> > evolution-kolab implementation. We're now porting to current Evo/EDS git master,
> > but I would still keep the current implementation unchanged when it comes
> > to how to interpret UIDs of imported objects.
> 
> That is the whole point of this mail thread: a vCard UID may have a
> meaning outside of the storage in which it currently exists. EDS cannot
> know whether that is the case. Currently it assumes that the UID has no
> meaning and throws it away when adding contacts.

Not globally true. The file backend may do so, but it is the backend implementation
deciding whether re-writing a UID or not. E-D-S cannot decide that, since it does
not know what a given backend is dealing with. For the evolution-kolab backend,
only those UIDs are rewritten which do already exist on the Kolab server. UUIDs
would, as a matter of fact, be helpful here. There is *some* risk, however, that
by chance you generate a UUID for a PIM object, and may find that even this one
exists on the server, because out of full randomness, a UUID-unaware Kolab client
hase decided that exactly *this* would be the UID string it would want to use today.
Granted, this is a corner case and it should hit so very seldom that manual intervention
would be acceptable.
  My point is, that you might be very well able to make good use of UUIDs if you
implemented a fully new system, which would enforce all clients to be using UUIDs and
all servers dealing with objects checking that the UID is really UUID. As long as this
is not the case, you will need to deal with UIDs which are not UUIDs, which of course
is as much a pain as a reality.

> Consider sending around vCard with globally unique UID. A user might
> import that vCard once into Kolab using Kolab client A. Then when he
> tries again with Kolab client B, the client can warn him reliably that
> this particular contact was already imported.

You can rely on that if and only if you are dealing with UUIDs exclusively.
As long as you cannot assure that all UIDs used are UUIDs (mainly because you have
objects which have beed sitting around for 5 years already - not uncommon in Kolab world),
a matching UUID gives you a good hint, but it would not free you from the ever-failing
PIM object fuzzing to tell whether we have a match or not.

> > > > PIM objects already residing on the Kolab server do carry a UID, created by the
> > > > client which created the object (evolution-kolab, Kontact, Horde, Outlook with
> > > > a Kolab connector, Thunderbird via Lightning/SyncKolab, ...).
> > > 
> > > Do you attempt to make the UID globally unique, for example by using a
> > > UUID?
> > 
> > In our current implementation, the UID will be unique to the Kolab server at hand.
> > Since the Kolab server does not impose specific restrictions on the format of the
> > UID, evolution-kolab could change the UID generation code (we currently use E-D-S
> > infrastructure for this) to generate UUIDs. However, other Kolab clients are free
> > to follow their very own scheme of UID formatting (they may well decide that UIDs
> > unique for a given folder only are unique enough). I'm not clear whether the new
> > Kolab format specification, which is in the makings right now, would enforce a
> > UID to be globally unique. Older clients would not follow that scheme, so if you
> > cannot *rely* on the UID of being globally unique, you do not gain anything. The
> > Kolab philosophy is to offload almost _everything_ to the clients for maximum
> > scalability and minimum server load, accepting the fact that the server cannot
> > really enforce anything. The Kolab server itself is, as I said, fully unaware of
> > the PIM data. It stores all PIM objects as emails in IMAP folders. Hence, it will
> > happily accept a client writing multiple PIM emails onto the server and into one
> > PIM folder, all carrying the same UID. It is really all in the hands of the clients.
> > If a new Kolab server will not enforce UIDs to be UUIDs (and it very certainly won't),
> > then your gain is zero if you implement UUIDs in one client only.
> 
> True. Because vCard UIDs are not guaranteed to be globally unique in the
> general case, they cannot be relied upon completely. But even in today's
> world they provide a good hint already. I've seen pretty long UIDs in
> the wild. Evolution is a bit on the short side here.

Letting E-D-S generate UUIDs would be a good start, for sure. From my Kolab experience
I can tell that the UIDs evolution-kolab generates via E-D-S infrastructure are about
double the length of the UIDs generated by Kontact or Outlook (with one of its various
Kolab connector plugins). Of course, double UID string length does not mean the uniqueness
is also doubled. ;-) I'd second your idea of leaping ahead and letting E-D-S infrastructure
generate good UUIDs. I have not looked too deep into E-D-S UID generation or measured
how much unique the presently generated ones are.

> > > > When it comes to importing a PIM object, it is not possible to retain its
> > > > UID in the cases where the same UID exists on the server already for another PIM
> > > > object (unlikely, but possible, since Kolab object UIDs are not required to be
> > > > globally unique). As long as we're in offline mode, we may at first succeed retaining
> > > > the object's UID, but when going online any syncing with the server, find that
> > > > a new UID must be set on the object.
> > > 
> > > What happens during syncing? Do you resolve the add<->add conflict by
> > > duplicating the item, merging them or discarding one copy?
> > 
> > This is a configuration option the user has. Kontact, as a reference client for Kolab,
> > will ask you in all events of synchronization conflicts. In Evo/EDS 2.30, we did not
> > have the infrastructure needed to query Kolab-specific user input from Evo, so the whole
> > thing is non-interactive. For each PIM folder, you can configure the backends to use
> > one of the following strategies:
> > * use the "newer" PIM object (relies on timestamps - since these are set by the clients,
> >   not the Kolab server, it only works if the client's clocks are synced)
> > * use the client's object (overwrites the one on the server)
> > * use the server's object (discards the client's changes)
> > * create a duplicate
> > These strategies apply if an object is alredy known to at least 2 clients and
> > is changed by both at the same time. If with "add<->add conflict" you mean that two clients
> > are adding *new* PIM objects, and both mean the same (say, two people adding an object for
> > the same event in some shared calendar folder), then there is no automatism to resolve this.
> > The result is, that two PIM objects have been added. That's it. Any automatism to guess whether
> > the two clients *really* *really* mean the same thing (and possibly merge the contents of the
> > two objects into one) would be horribly complex and therefore would most probably fail to do
> > The Right Thing exactly because of that.
> 
> The Synthesis engine has some pretty good support for exactly this
> problem. For the most common scenario (exact same iCalendar invitation
> or vCard imported twice) it can detect the content is the same and avoid
> duplicating the items. If less sure, it could fall back to duplicating
> the items.

I'm yet to see a reliably working implementation for that problem, at least when it comes to the
FLOSS world. Please forgive me for having become cynical about this ages ago. ;-) But, I'm always
happy to see if someone finally gets it right and solves a problem the world has been suffering from
since day one. I'd be really happy. Maybe Synthesis is worth a closer look. :)

> > If people find they have both added a PIM object for
> > the same thing into a shared folder, let them get into conversation, clarify things and remove
> > one of the objects after achieving consensus.
> 
> If such conflicts are rare, that's a good and pragmatic solution. But in
> the ad-hoc system mentioned above it could be very common (device A has
> all data, sends to B and C, then B and C sync => all items exist twice).

If reliable mass sync is what you're aiming for, then I wish you good luck. :-))
As far as evolution-kolab goes, I'm happy to use a good UUID-generating infrastructure
for that, preferably an E-D-S one. :)

Kind regards,

	Christian

-- 
kernel concepts GmbH       Tel: +49-271-771091-14
Sieghuetter Hauptweg 48
D-57072 Siegen
http://www.kernelconcepts.de/

Attachment: signature.asc
Description: This is a digitally signed message part.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]