Re: [Tracker] more issues with indexer-split



On miÃ, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote:
Hi!,

On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote:

<snip>

that sounds inefficient - trunk only ever checked for existing deleted
or junk emails at startup because iterating through all emails in the
summary files is expensive. 

From what I've read in trunk code, you still iterate through all the
mails in the summary in check_summary_file(), and you will have to
iterate over them again later to index new messages, etc...

As far as I know, it's quite unavoidable to parse again summaries, since
under some circumstances Message IDs could be reused, which would leave
you with inconsistent data in the DBs. Even if it isn't, expunging a
folder would render any stored offset for the summary file useless (even
dangerous).

Besides, when testing summary parsing, I remember it was pretty fast
(like 2-3 seconds for a ~6500 emails summary), of course without
inserting to DBs nor doing message body or attachments sniffing, which
is more or less what should happen if the junk/deleted flag is set.

To back this up, I've played modifying tracker-indexer to not store
anything in DBs and just get data from mail summaries (no body,
attachments, etc...), according to time(1), it takes:

real    0m2.281s
user    0m1.184s
sys     0m0.968s

for my IMAP account, roughly 30000 emails, even better than what I
remembered. I haven't played with flushing disk caches, etc... but I
wouldn't think it's inefficient.

Regards,
   Carlos





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]