Re: [Tracker] Initial email indexing support



Le mercredi 20 septembre 2006 Ã 14:11 +0100, Jamie McCracken a Ãcrit :
Laurent Aguerreche wrote:
Hello,

this is a patch for an initial email indexing support with Evolution. It
extracts content of sent and received emails: text + attachments.

superb - my hero :)


Currently, tracker_db_save_email() tries to save text and header infos
but it is missing some DB stuff... (Jamie...)

yes I will get on to this.

I have almost finished finished sqlite support so I expect both this and 
the email/db support to be ready this weekend


For email bodies in HTML, there is nothing done and I don't know exactly
how it will be done.  :-)

Cant you get Gmime to convert it (iirc they have an html filter in there 
- google or look at the docs for it)

if not we already have an html->text converter using the htmless app in 
filters/text/html which we could call as a last resort (it wont be nice 
though cause we would need to store the body as a tmp file for htmless 
to convert it to a text file)

I will look.


Attachments have their metadatas/content extracted but that's all, and
they have their uri in something like /tmp/pid/attachment...
Currently, parent directories of treated files won't be deleted! So,
check your /tmp directory regularly.

thats okay so long as the actual attachement files are unlinked.




Now, I'm waiting for DB things in:
* tracker_db_get_last_mbox_offset
* tracker_db_update_mbox_offset
* tracker_db_save_email

(Since tracker_db_get_last_mbox_offset() isn't implemented it always
returns 0 which it makes indexing to always start from the beginning of
the mbox to treat.)


I will take a look at KMail.
In thunderbird, there is a stupid file to parse to find profiles then
directories for mboxes...



Note: this patch isn't intended for real use...

thats okay I expect we should have an -evolution parameter to trackerd 
to indicate whether to index evolution stuff (I will add this later)

I also noticed no checks are done in tracker-mbox-evolution to determine 
if an email is marked as junk or deleted  (X-Evolution entry has bits at 
the end to determine status (flagged, replied, seen, deleted, junk)) - 
you may have to experiment with deleteing/marking as junk to work out 
the exact flags used.

Yes and I tested it. With Evolution 2.6.3 in Debian SID it seems
completely buggy... Some emails are marked as junk or already answered
just after I received them... While reading sources of
evolution-data-server I saw an error in camel source code and reported
it but I don't know how much it affects evolution.

However I will use it but we will probably have some surprises with!


For thunderbird, I believe it uses MOZILLA-STATUS but with different 
flags from EVO.

Ok.

this stuff can wait I suppose - I will need a day or two to go through 
your patch although it looks good at first glance.

Thanks for your time on this.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]