Re: [Tracker] Tracker to do list



Le jeudi 14 septembre 2006 Ã 22:51 +0100, Jamie McCracken a Ãcrit :
Laurent Aguerreche wrote:

2) As (1) but parse only new mails (given a file offset of the last 
known email). All new mails are always appended to an mbox file.
I added a tracker_add_watch_file() to be called on each mbox file.
mbox files can be dynamically added (eg in thunderbird or evo you can 
create new vfolders with their own mbox file) so the directory must be 
watched

Ok...

I plan to add mbox as watched files (or directories for vfolders I
think) 

all the email clients allow you to create new mbox files so directory 
watching is probably essential to pick these up

but I wonder if tracker_create_file_info() should be modified to
let programmer to set info->file_type to FILE_EMAILS directly, or right
after its call.
To find whether a file is a mbox, I will use a list of mboxes (or a hash
table?) to check it in process_event() for inotify.

Then, extract_metadata_thread() will identify file as an e-mail and will
treat it accordingly.

Some commentaries?


I recommend following:

1) In the global Tracker struct add a GSList for email sources. The 
sources should be a struct with directory of mbox files and type (evo, 
kmail etc)

2) when inotify/fam receives any file change event we check against 
those sources during process files thread (check prefix against email 
source directories) and if an mbox file we call a new function 
index_mails (instead of the index_file in process_files_thread).

3) index_mails will (if mbox size has increased) need to get the last 
known offset for the mbox file from the DB (I need to create seperate 
tables for emails as well as modify the stored Procs) and parse all new

Are these tables somewhere? :-D

messages since that point. Your mbox functions should have a 
parse_from_offset and a parse_next calls. Parse_next will return Null 
when no further emails to process.


so  index_mails code should look something like:

MailBox *mb;
MailMessage *msg;

mb = tracker_mbox_parse_from_offset (uri, offset);

while (msg = tracker_mbox_parse_next (MailBox mb)) {

      tracker_db_save_email (msg);
}

And we need something to handle case where an email is removed from mbox
to update offset!

MailBox struct would need to encapsulate the Gmime stuff and also keep 
track of offsets for the next email to be read

MailMessage struct should contain all the metadata for one email

{
      char    *mbox_uri;      
      guint64 offset;         (start address of the email)
      char    *message_id;
      char    **references;   (array of message_ids)
      char    *reply_to_id;   (message_id of email that it replies to)
      long    *date;
      char    *mail_from;
      char    *mail_to;
      char    *mail_cc;
      char    *subject;
      char    *content_type;  (eg text/plain or text/html etc)
      char    *body;          
      GSList  *attachments;   (names of all attachments)
      
}

I'm able to extract all these infos in my Evolution's mboxes  :-)
But I don't understand what "references" are. Can you explain please?

I also modified mail_to and mail_cc to GSList... I also add mail_bcc.


to index attachments we will need function :

char * tracker_mbox_index_attachment (msg, attachment_name);

this should check the mime of the attachment and if text or a document 
then extract it to tmp directory and copy the code from index_files (but 
ignore the tmp path!) to index it.

I call tracker_mbox_index_attachment() in index_emails(), then I do
something to make index_file() indexing the tmp_file from email
attachements? (And besides that, metadata thread will extract info from
tmp_file next)
It is right?

If I am right, I could copy attachements to /tmp at email indexing... (I
already do that but of course I can modify it). This way, it would
require to iterate parts of emails only once otherwise I will need a
first pass to find name's attachements.


Laurent.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]