Re: [Evolution-hackers] RFClue: Atomic folder updates



On Mon, 2011-12-05 at 09:49 +0100, Milan Crha wrote:
>  I'm not sure if it's understood from your description, but the
> SyncKey on the exchange server changes as soon as the Sync call is
> finished (the server returns the new key), and asking with the old key
> results in this "bug". 

Not quite. Asking again with the old key is *fine*, and it *MUST* be
fine. It happens a lot, if a mobile client is disconnected from the
network before it even *receives* the reply. There's a *huge* window
where the reply can get lost, so the server absolutely *has* to cope
with clients coming back to it with an "old" key.

In ActiveSync the server stores SyncKey information per client, and will
keep one previous key per client per folder. That's enough to deal with
client crashes. In EWS the SyncKeys seem to last much longer — as if
they contain all the information needed to find the right point in the
database transaction log, and nothing *extra* needs to be stored on the
server side, perhaps?

So the mere fact that you are "asking with the old key" is *not* a
problem.

The bug is caused because we crash and restart with an *inconsistent*
state because we've applied *some* of the changes. We then ask the
server "what happened since XXX", but our cache doesn't *match* what the
server had at time "XXX"!

In fact, even that would be OK if we got the *same* answer. All of the
changes we get given in a single update are perfectly fine to apply
twice. The real problem happens if the folder has changed again in the
meantime, and some of the changes we were originally given in the
XXX->YYY delta have now been *reverted* (like a message being marked
read and then unread, or created and then deleted).

It goes like this...

 - We ask server "what happened since XXX?".
 - Server tells us "message 123 was created, and your next key is YYY".
 - We add message 123 to our cache, and crash before storing the new
    key.
 - Someone *deletes* message 123.
 - We restart, and ask the server "what happened since XXX?".
 - The server tells us *nothing* about message 123 — as far as it's
    concerned, that message was added and deleted without us ever
    knowing anything about it. But it is in our cache, and the server
    is *never* going to tell us that it got deleted because the server
    doesn't think we know it's there.


> So, you could do:
>  a) Call Sync With the XXX key
>  b) process result with returned YYY key in a way:
>     - mark CamelMessageInfo-s which changed as changed (to be updated)
>     - create fake info-s with the same flag as above for new items
>     - delete those deleted
>  c) save changes to disk with the new YYY Sync key
>  d) update what is supposed to be updated
> 
> There still is a chance that you crash during b), though that all is
> supposed to be done on a local machine only, thus should not take that
> long (depends on folder size and such).

Hm, I'm not quite sure what that process would achieve. The race
condition you highlight (crash during (b)) is exactly the one I'm trying
to eliminate. However small the window is, if you can crash at a certain
time and get data corruption, that is a *bug*.

There is no ordering for your (a), (b), and (c) which works; they have
to be *atomic* and hit the disk all at precisely the same time, or we
need an alternative solution (roll-back or roll-forward).

> You may check on folder open whether there are any info-s marked for
> update and process d) if yes, and only then continue with the stored
> sync key.
> 
> I guess something like that should work. I do not see how any atomicity
> would help you here, because when any trouble happens between
> SyncFolderItems call and saving of all received changes with the new
> Sync key will always result in a desync between either key stored
> locally and on the server or the folder content not being updated fully.

Atomicity means that when we restart, we have *either* the "before" or
the "after" state. Not anything in between.

So either we restart with Sync key 'XXX' and none of the changes applied
to our cache, or we restart with Sync key 'YYY' and *all* of the
changes. Either situation is fine — the server is quite happy for us to
start up and ask it *either* "what changed since XXX?" *or* "what
changed since YYY?".

The *only* issue happens when our cache is inconsistent, and we end up
applying the "changes since XXX" to a copy of our local cache which
*doesn't* actually match the real state of the folder at that point in
time, because we'd already applied *some* changes to it before crashing.

That's why the atomicity is necessary.

If Evolution's cache storage can't give us that atomicity, then we
should be able to fake it. I suspect the best answer is to write the
server's response to disk before processing it. On startup, we can check
if any such changes are outstanding and need to be "replayed".

We'll *still* be replaying "changes since XXX" to a folder state which
doesn't actually match XXX, but at least it'll be the *same* set of
changes. That actually makes it OK.


> Nonetheless, there are people whom are willing to connect to their
> exchange account from multiple machines. Did you try whether this sync
> can work in such environment, please?

Yes, I've been using EWS from both my laptop and my desktop for much of
the last year (until I switched focus to ActiveSync; now I run only on
one since I got a new laptop and haven't yet configured EWS on it).

-- 
David Woodhouse                            Open Source Technology Centre
David Woodhouse intel com                              Intel Corporation


Attachment: smime.p7s
Description: S/MIME cryptographic signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]