[Evolution-hackers] API considerations for avoiding data loss in race conditions



Hello all,
   I'd like to raise this issue on the list for feedback regarding this
bug: https://bugzilla.gnome.org/show_bug.cgi?id=686684

First let's start with the basic problem statement:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It can happen that two clients modify the same contact simultaneously,
or... that one client loads a number of contacts into memory and then
commits that contact to the addressbook without refreshing the contact
first, this will result in lost data for the given contact.

Example:

  o Client 'A' loads Contact 'Foo' with any of the apis
    for fetching contacts from the addressbook.

  o Client 'B' also loads Contact 'Foo'

  o Client 'A' calls e_book_client_modify_contact() after
    modifying the contact's Full Name

  o Client 'B' modifies contact 'Foo' after modifying one
    of contact 'Foo's email addresses

In this case contact 'Foo' has an updated email address but
an old and invalid Full Name.

In order to prevent this, we propose that the addressbook should
refuse to modify any contact if the passed contact's revision
does not match the revision of the contact currently stored by
the backend.

In the above example; Client 'B' would receive a specific OUT_OF_SYNC
error code in reply to calling e_book_client_modify_contact(), in which
case it would have to fetch the fresh contact from the address book,
modify the email address on the refreshed contact data and retry.

There are a couple of API concerns to address, however:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  a.) Unsuspecting Applications

  If we just modify this behaviour then any unsuspecting application may
  start to mal-function, if for example, that application where to make
  a series of calls to e_book_client_modify_contact() in a row without
  refreshing the contact in between calls, that sort of code would start
  to fail.

  There are a couple of avenues we can take to address this:

    1.) An ESourceExtension decides the desired behaviour of the
        backend, i.e. should the backend refuse modifications on
        out-of-sync contacts, or not.

        This could be disabled for a while giving time to applications
        to adapt to the new behaviour before enabling it, potentially
        forwarding the decision to those who configure EDS on the system
        where they intend to use the addressbook.

    2.) We could make some invasive API changes, adding a parameter
        to e_book_client_modify_contact() in frontend & backend apis
        deciding what the behaviour should be for each invocation,
        in this way one could say "Force my data to replace a given
        contact in the addressbook, even if it's out-of-sync".

  b.) Synchronization of addressbooks

  I don't think this is a big problem but worth mentioning in this
  context: If contacts in EDS have changed since the last
  synchronization, and the same contacts on another device have
  also changed, SyncEvolution will detect that the revisions don't
  match with the last synchronization and will then fetch the updated
  contact data from EDS before attempting a merge and performing a new
  commit.

  c.) Remote backends, and collection backends

  In the case that a backend does not have the data readily on hand,
  then enforcing this policy becomes more difficult.

  For example, what if you currently don't have an internet connection
  and you want to modify a contact that is stored in some off-site
  database, in which case you are using some offline cache, and you
  can't be notified right away if your contact was actually out-of-sync
  or not.

  I think as a first step it would be alright to only really enforce
  the policy in the local 'file' backend, but we should have a plan
  at least as to how to handle this case in the future.

  Milan suggested that one of the options for contact modifications
  can be "Create a Backup Copy if Out-of-Sync Occurs", in this case
  a remote backend could have 3 options when synchronizing with offsite
  data:

      1.) Drop local modifications if the remote contact has also
          changed.

      2.) Force overwrite the remote contact with the new contact
          data regardless of revision mismatches.

      3.) Create a copy of the contact with a newly invented UID,
          maintaining both local and remote changes to the same
          contact and leaving the final decision up to clients.

  These three options would potentially be an enumeration which would
  be the newly added argument to e_book_client_modify_contact()


  d.) Handling revision mismatches on contact removal

  Without changing current EDS APIs, we cannot enforce this really since
  the inputs for removing contacts is the UID only, however we could
  potentially enforce the same rules if we were to require
  e_book_client_remove_contacts() to provide contact revisions
  (E_CONTACT_REV) as well as the contact UIDs.

I'm particularly interested in Remote/Collection backends and their
needs, what are the requirements for backends such as the google
backend which persist their contacts remotely ?

What is the least intrusive way to accomplish this ?

Any comments, ideas ?

Regards,
         -Tristan


Including brief IRC conversation I had with Milan this morning, in case
it's of any interest:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<tristan> mcrha, I have to try to re-open the can-of-worms which is the code which checks against revision mismatches <tristan> mcrha, I have to admit at first I left out one portion of the patch in the bug report... which is an extension that configures whether the backend uses the new behavior or not
<mcrha> hi tristan
<tristan> good morning :)
<tristan> I have to admit it's a tricky thing, I still think whatever should be done, we should not compare revisions by their timestamps (i.e. one revision is not 'greater than' another revision) <tristan> for cases like syncevo; either a contact changed on the EDS, or a contact changed on the remote device, or both contacts changed 'since the last sync' <tristan> pohly also raises the point in the bug report that we do not handle modified/removed case
<tristan> this can't be handled in the same way unfortunately
<tristan> well, at least not with e_book_client_remove_contact_by_uid()
<tristan> perhaps we could do it with e_book_client_remove_contact()
<tristan> I think it's safer to put that aside at least for now though, the modified/removed case <mcrha> ok, what do you suggest now, either the revision will match, then do the operation, or it'll not match, thus claim an error? <tristan> yes, however, with the added configuration on the addressbook, potentially allowing some time for applications to prepare for the change <tristan> so that applications which dont expect this behavior yet dont get bitten too quickly <tristan> i.e. the case being any code which does e_book_client_modify_contact() a bunch of times in a row without refreshing the client side contact... would need to be changed <mcrha> how will the application recover from this? say a user will use "alwaysoverwrite", will the application do: write->failed->read->change->write trip? <mcrha> If we can inspire a bit on something similar: http://msdn.microsoft.com/en-us/library/aa580254%28v=exchg.140%29.aspx <mcrha> they use explicit ConflictResolution (not that their values are usable for us) <mcrha> with which you can avoid that round-trip just by telling the backend that "my is always correct"
<tristan> well, the granularity of the revision is only 'per contact'
<tristan> not per contact field
<mcrha> right, but I do not think it's a problem
<tristan> so the version of contact that you send over the bus is a complete contact, I dont see how you can detect 'what portion' is up to date, which portion should be merged, etc
<mcrha> yes, I thought of complete contact conflict resoluton, not per field
<tristan> so if for example, client A removes an address from the contact, and client B updates the given name <tristan> what you want in the end is the contact to have the address removed _and_ the given name updated both <mcrha> the URL was just an example, an explicit conflict resolution parameter for write operation <tristan> I dont see how you do that without retrying on error, short of adding more granularity on revisions <mcrha> yes, yes, if the client can handle conflicts, then it passes "NeverOverwrite" and it'll take care of the merging on its own (with the round-trip). <mcrha> I wanted clients to be able to explicitly say what to do, because not all clients will provide merging functions
<tristan> I see
<mcrha> either we can do it per write (change API), or as one value in your extension <tristan> I was thinking of that this morning actually, the extension was a bad approach for that reason exactly <tristan> because you would want a scenario where not all clients can handle it... <mcrha> there might be done one API change anyway, for the "remove", to provide also revisions the client expects as "to be removed" <tristan> right, that could be done implicitly via e_book_client_remove_contact() (where the whole EContact is provided to the client) <tristan> but it still represents added/changed D-Bus api and added/changed EBookBackend api <mcrha> tristan, mbarnes|away is working on backend API changes currently, I guess mainly the DBus code generation, but I heard about some changes on views too, thus I suppose he can change the add/remove/update API accordingly <tristan> so I got to thinking perhaps the EBookClient (and GDBus counter-part) could have a settable boolean property for this <mcrha> tristan, giving full contact to "remove" is suboptimal, maybe clients do not have them always
<mcrha> boolean?
<tristan> like your 'NeverOverwrite' example
<tristan> to avoid adding a parameter to e_book_client_modify_contact() and other user facing apis <tristan> but the problem still is that EBookBackend apis still would need to transport that parameter
<mcrha> I think of three states: AlwaysOverwrite, NeverOverwrite, CreateCopy
<tristan> what does CreateCopy do ? one contact becomes 2 contacts ?
<mcrha> tristan, if Matthew if touching API anyway, then the parameter addition for the conflict resolution and rev for "remove" will be no issue
<tristan> nod
<mcrha> yes, if you update, and you face a conflict, and you cannot solve it now (your counter part is unreachable on the phone to confirm his/her address), then you can postpone by creating copy in the book, rather than lost the data completely
<tristan> hmmm
<tristan> I'm not sure about that, IMO this should essentially all be about race conditions <tristan> i.e. ideally... a client maintains an EBookClientView which receives notifications <tristan> at the time e_book_client_modify_contact() is called, it should _always_ be done with a fresh/complete contact
<tristan> not with one that was loaded at application startup time
<mcrha> tristan, let's think about it and take mbarnes|away into the loop too. I suggest you start a thread on evolution-hackers on the API change for backends (please do both book and cal), with a summary of our chat here <tristan> so actually, your counter part just modified her full name at the very same time that you removed her address
<tristan> so it merits a retry
<mcrha> you forgot of remote/collaborative backends, like exchange, with many users accessing shared books/calendars <mcrha> ask xtian, he has quite good knowledge on the conflict resolution issues (he's not here right now, will be later today)
<tristan> alright, I'll bring it up on the ML
<mcrha> my other point with the three state was that I would rather use an enum, in case we'll find more values usable



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]