Re: Rescanning IMAP folders



On Tue Feb 13 11:39:54 2007, Philip Van Hoof wrote:
Rescanning an IMAP folder has two purposes:

-> Updating the flags
-> Removing the expunged messages from both the cache and the summary

The second part is implemented badly. That's because IMAP has two things
to identify one message:

Its sequence number, and its UID.

The sequence number MUST be the index of the array of the summary. The
UID is a field like any other per item.


Almost... The mailbox is always ordered by UID. Like every other item except FLAGS (discounting ANNOTATE for the minute) it's also immutable.


An expunge will happen like this: "* 2 EXPUNGE". This means that the
message with sequence 2 in the current folder has been expunged.

What will happen with the Push-Email implementation (the IMAP IDLE) is
that simply message #2 will be removed from the summary.

But if there was no IMAP IDLE for that folder, the only way to
synchronize both is to check one by one whether the sequence (the index of the array + 1, as the IMAP sequence starts at 1 and C arrays start at
0) and the UID match.


No, you do exactly the same thing. IDLE or not IDLE should always be treated identically. (For a client - for a server, it's a little different).

There's one case where you need to be careful, and that's when you're in IDLE, and want to send a command with sequence numbers in - then you have to say DONE and wait for the IDLE to complete before you send the command.

Imagine an expunge of message #1. What will happen:

SEQUENCE: 1 2 3 4 5
INDEX:    0 1 2 3 4
UID:      1 2 4 5 6

Server expunged #1

SEQUENCE: 1 2 3 4
INDEX:    0 1 2 3 4
UID:      1 2 4 5 6


No, that's wrong - you know that the first message has been removed, so that has to be UID 1 that's removed.

But was this really necessary? Not really, we can re-calculate the local summary to avoid the refetching of all in imap_update_summary, right?


If you treat the mailbox as a 1-indexed array of messages ordered by UID, then you just remove the entry the EXPUNGE tells you to.

If any expunge was detected, the condstore implementation isn't used.
The condstore implementation doesn't cope with expunges. So it ain't
used for those situations. Which is why imap_rescan is ALWAYS important
enough to be *very* optimized.


Gah. Sort of.

If you're connected, then there's no need to do any scanning at all - you get the EXPUNGE, you remove that message, you're done.

(That's not to say rescanning on SELECT shouldn't be carefully optimized, though).


The detection of whether expunges happened checks the last local UID
with "UIDNEXT - 1" and compares the EXISTS with the local size. Only if
CONDSTORE is activated, "LAST LOCAL UID == REMOTE UIDNEXT - 1" and
"LOCAL SIZE == EXISTS" do I assume that condstore can be used. Else
imap_rescan is used. Is that a correct assumption?

No... Not quite. Look at how Polymer (or rather, the IPL) does it. At the moment, it's synchronous code, so it's very easy to follow.

http://svn.dave.cridland.net/svn/projects/infotrope/python/infotrope/imap.py

It's line 3148, mailbox_reselected() - the code below it is all rather heavily used too - down to about line 3392 is all about keeping the basic UID mapping in sync.

Loosely, on SELECT, I cache:

A) EXISTS
B) UIDNEXT

And I set C to 0. Every time I witness an EXPUNGE, I increment C.

On the next SELECT, I look at the new value of EXISTS (D) and UIDNEXT (E).

If (E == B), then you know no new messages have arrived. We'll call this condition F.

If (A - D + C) == (B - E), then we know that no messages have been expunged. We'll make this condition G.

So, if F is true, then we'll move onto looking at HIGHESTMODSEQ, and if we've seen those changes, we can avoid doing a FETCH at all. Neat, eh?

If G is not true, then we have expunges somewhere. The fastest way to update these is to use "UID SEARCH ALL", or, with ESEARCH, "UID SEARCH RETURN () ALL". This can still be quite big with huge mailboxes, though. Luckily for us, users are rather predictable, and almost all EXPUNGE events happen within the last few messages, so I find them by looking at the UIDs corresponding to the sequence numbers for the last messages I knew about, which generally finds them. (If not, the code gets scary).

You could easily enough skip this, if you wanted, since when you do a FETCH, you'll get back both the UID and the sequence number. If those match what you thought they were going to be, you know you haven't missed any expunges up to that point.

Note that I do process "* seq EXPUNGE" in IDLE (they'll simply get
removed from the local summary by using the sequence-1 as array index,
and this seems to work correctly).


It'll work correctly whenever you get an EXPUNGE.

IDLE only actually does one thing - because it's a command, you can get EXPUNGEs. Otherwise it's the same as a long-running NOOP.

I added the author of Polymer/Telomer in CC (Dave). That's because I
will most likely take a look at his source code for this too.

Jolly good. See if you can spot the bug. ;-)

Dave.
--
Dave Cridland - mailto:dave cridland net - xmpp:dwd jabber org
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]