Re: [Tracker] more issues with indexer-split



Jamie McCracken wrote:
On Wed, 2008-08-13 at 19:30 +0200, Carlos Garnacho wrote:
As far as I see, for mbox you're storing the offset in the stream:

        msg_offset = g_mime_parser_tell (mf->parser);
        ....
        mail_msg->offset = msg_offset;
        
For IMAP, I just get "0" in the Services table, also didn't get to see
any code to do this.


imap stores message count too - its count rather than byte offset

As Carlos says, this code is NOT working in TRUNK for IMAP. So this
whole argument is moot.

no when junk/deleted email is encountered during the start up scan its
UID is checked against that table  (JunkMails) to see if we already know
about it. If its not in that table then we add it and then delete it
from our index. Ergo its more efficient than what you have

The whole idea of keeping a separate table for deleted/junk email sounds
really inefficient to me. I have quite a bit and I get quite a bit every
day, that's a lot of extra processing. Surely it is MORE processing than
the current inefficiencies you are outlining with our current design?

Could you tell me where's that code? The only users for
InsertJunk/LookupJunk (the stored procedures) are
tracker_db_email_insert_junk() and tracker_db_email_lookup_junk(), the
former is also the only user of the latter, and it doesn't do what you
mention.

The only place I see where it could delete emails from the DB for
Evolution is check_summary_file(), and tracker_db_email_delete_email()
seems to be called inconditionally for any junk/deleted message found.

the way it should work is as described above

I had tested it and it works (deleted and junk emails are pruned on next
restart of trackerd)

What you're saying here doesn't make a lot of sense to me. It sounds
like you're saying that if mail is marked as junk or deleted you don't
want to update the index until we restart the daemon? So people will
still be searching and finding junk until trackerd is restarted? That
doesn't sound right to me. Or did you mean something else?

How do you currently tell which emails are new in the summary file?
Without storing the count you cannot know without verifying each email
exists in the services table (which would obviously be unacceptable
performance wise)

You haven't answered the question. Where is the code?

the trunk way is faster so i would prefer that restored

TRUNK doesn't work as you think it does.

If you bear with me, I'd prefer to try a few optimizations before having
to add special cases.
well not doing the junk/deletion check everytime the summary file changes must obviously be faster?

Plus Carlos is right, this code can probably be optimised much more than
it is now. It has just been written to get working so far.

Sure, but it's also more beneficial for users if tracker DB contents are
up to date with the actual data. Also, IMHO adding special cases like
this would break a design that makes tracker really extensible and easy
to develop for.

Carlos has spent a lot of time designing this.

I spoke further with him about it too, we could change the way we do
things now to use GTypeModule and GInterface to make it extensible, but
that will take a few days at least to do.

This issue in general is not a show stopper, it is a performance issue,
the performance issue we have with index.db (which you say you will fix
next week by using SQLite with FTS) is much more of an issue than this
by far. I would suggest we merge and resolve these on trunk so you can
get on Jamie.

that can be done easily -  for quick synch test just check last known
UID in summary file (using stored message count) exists in services - if
it does not then you have a count mismatch and a resync is required

I don't claim to know much about the UID, but what if you receive a mail
and delete a mail - won't the count the be the same? Resulting in your
count check for a resync breaking?

this can be done whenever a new email arrives as its not expensive

suggest having a resync method to do above and a check_synch one to test
its ok

We could have this soon, but it won't be today unfortunately. Carlos is
on vacation.

-- 
Regards,
Martyn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]