Re: Rescanning IMAP folders

From: Philip Van Hoof <spam pvanhoof be>
To: Dave Cridland <dave cridland net>
Cc: tinymail-devel-list gnome org
Subject: Re: Rescanning IMAP folders
Date: Tue, 13 Feb 2007 15:11:20 +0100
On Tue, 2007-02-13 at 13:34 +0000, Dave Cridland wrote:
> On Tue Feb 13 11:39:54 2007, Philip Van Hoof wrote:
> 
> 
> > An expunge will happen like this: "* 2 EXPUNGE". This means that the
> > message with sequence 2 in the current folder has been expunged.
> > 
> > What will happen with the Push-Email implementation (the IMAP IDLE) 
> > is
> > that simply message #2 will be removed from the summary.
> > 
> > But if there was no IMAP IDLE for that folder, the only way to
> > synchronize both is to check one by one whether the sequence (the 
> > index
> > of the array + 1, as the IMAP sequence starts at 1 and C arrays 
> > start at
> > 0) and the UID match.
> > 
> > 
> No, you do exactly the same thing. IDLE or not IDLE should always be 
> treated identically. (For a client - for a server, it's a little 
> different).

Oh, yes. it's the same code that does this. So the same thing happens.

> There's one case where you need to be careful, and that's when you're 
> in IDLE, and want to send a command with sequence numbers in - then 
> you have to say DONE and wait for the IDLE to complete before you 
> send the command.
> 
> > Imagine an expunge of message #1. What will happen:
> > 
> > SEQUENCE: 1 2 3 4 5
> > INDEX:    0 1 2 3 4
> > UID:      1 2 4 5 6
> > 
> > Server expunged #1
> > 
> > SEQUENCE: 1 2 3 4
> > INDEX:    0 1 2 3 4
> > UID:      1 2 4 5 6
> > 
> > 
> No, that's wrong - you know that the first message has been removed, 
> so that has to be UID 1 that's removed.

Oh but I wanted to illustrate the local state when remotely an expunge
happened before locally things are synchronized. It's the last series of
numbers that is the end-result after synchronization. In-between the
first two series nothing at the client has happened yet.

> > But was this really necessary? Not really, we can re-calculate the 
> > local summary to avoid the refetching of all in imap_update_summary, 
> > right?
> > 
> > 
> If you treat the mailbox as a 1-indexed array of messages ordered by 
> UID, then you just remove the entry the EXPUNGE tells you to.

But when rescanning I haven't yet had expunges :). Those expunges I only
seem to get when in IDLE. And what about synchronizing a folder that had
been offline ;-)? Both my own and remote expunges need to get in sync
with each other.

> > If any expunge was detected, the condstore implementation isn't 
> > used.
> > The condstore implementation doesn't cope with expunges. So it ain't
> > used for those situations. Which is why imap_rescan is ALWAYS 
> > important
> > enough to be *very* optimized.
> > 
> > 
> Gah. Sort of.
> 
> If you're connected, then there's no need to do any scanning at all - 
> you get the EXPUNGE, you remove that message, you're done.

Right, that's how it currently works. Yet it will upon selecting a new
folder still check for changes (because only the current folder is
selected and I noticed that only for the current folder events happen
while in IDLE).

> (That's not to say rescanning on SELECT shouldn't be carefully 
> optimized, though).


> > The detection of whether expunges happened checks the last local UID
> > with "UIDNEXT - 1" and compares the EXISTS with the local size. 
> > Only if
> > CONDSTORE is activated, "LAST LOCAL UID == REMOTE UIDNEXT - 1" and
> > "LOCAL SIZE == EXISTS" do I assume that condstore can be used. Else
> > imap_rescan is used. Is that a correct assumption?
> 
> No... Not quite. Look at how Polymer (or rather, the IPL) does it. At 
> the moment, it's synchronous code, so it's very easy to follow.
> 
> http://svn.dave.cridland.net/svn/projects/infotrope/python/infotrope/imap.py
> 
> It's line 3148, mailbox_reselected() - the code below it is all 
> rather heavily used too - down to about line 3392 is all about 
> keeping the basic UID mapping in sync.
> 
> Loosely, on SELECT, I cache:
> 
> A) EXISTS
> B) UIDNEXT
> 
> And I set C to 0. Every time I witness an EXPUNGE, I increment C.
> 
> On the next SELECT, I look at the new value of EXISTS (D) and UIDNEXT 
> (E).
> 
> If (E == B), then you know no new messages have arrived. We'll call 
> this condition F.

Errm, but this only works for when IDLE is available? And what about
changes to a folder that wasn't selected? Or changes that have happened
while the client was offline?

> If (A - D + C) == (B - E), then we know that no messages have been 
> expunged. We'll make this condition G.
> 
> So, if F is true, then we'll move onto looking at HIGHESTMODSEQ, and 
> if we've seen those changes, we can avoid doing a FETCH at all. Neat, 
> eh?

Yes, neat. But I wont get EXPUNGEs for just any folder... so what about
C for those? :-)

For example my user has INBOX selected and he's online. So he gets
EXPUNGE, FETCH, EXISTS and RECENT for INBOX because I've set his session
with the IMAP server in IDLE. I act on that, so that folder will be kept
in sync (assuming the IMAP server's IDLE implementation is cool).

Now my user selects Inbox.100, a subfolder that wasn't selected. In
another E-mail client another person expunged stuff from Inbox.100 while
my user had INBOX selected.

I never received any 'C' for Inbox.100, only for INBOX. I already have
quite some offline data of Inbox.100, how can I avoid that my user has
to download everything of Inbox.100 again?

So with CONDSTORE I check the highestmodseq of course. But if that is
enabled I know that my own code can't cope with expunges in Inbox.100
yet. The old imap-rescan can. So my idea was or is to check the local
size vs. the exists and the uidnext value - 1 against my last local uid.

Which might be incorrect if an expunge and a add happened both at the
end of the mailbox by that other user (that I understand).

Maybe I should also check the sequence of the last 5 or 6 messages
versus their uid? Well the sequence and the uid of the last message is
also always compared with what I have locally by the way. This was old
camel code that I didn't remove. And if they don't match, again a full
rescan will also happen nonetheless.

So.. I check for

local size == exists
last local uid == uidnext - 1
last remote uid and last remote seq == last local uid and last local seq


> If G is not true, then we have expunges somewhere. The fastest way to 
> update these is to use "UID SEARCH ALL", or, with ESEARCH, "UID 
> SEARCH RETURN () ALL". This can still be quite big with huge 
> mailboxes, though. Luckily for us, users are rather predictable, and 
> almost all EXPUNGE events happen within the last few messages, so I 
> find them by looking at the UIDs corresponding to the sequence 
> numbers for the last messages I knew about, which generally finds 
> them. (If not, the code gets scary).

:-)

> You could easily enough skip this, if you wanted, since when you do a 
> FETCH, you'll get back both the UID and the sequence number. If those 
> match what you thought they were going to be, you know you haven't 
> missed any expunges up to that point.
> 
> > Note that I do process "* seq EXPUNGE" in IDLE (they'll simply get
> > removed from the local summary by using the sequence-1 as array 
> > index, and this seems to work correctly).
> > 
> > 
> It'll work correctly whenever you get an EXPUNGE.
> 
> IDLE only actually does one thing - because it's a command, you can 
> get EXPUNGEs. Otherwise it's the same as a long-running NOOP.

I see

> > I added the author of Polymer/Telomer in CC (Dave). That's because I
> > will most likely take a look at his source code for this too.
> 
> Jolly good. See if you can spot the bug. ;-)

:-)


Thanks for your advise. Again, all sorts of new stuff ;)
Follow-Ups:
- Re: Rescanning IMAP folders
  - From: Dave Cridland
References:
- Rescanning IMAP folders
  - From: Philip Van Hoof
- Re: Rescanning IMAP folders
  - From: Dave Cridland
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]