Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all

From: Jeffrey Stedfast <fejj novell com>
To: Philip Van Hoof <spam pvanhoof be>
Cc: Evolution Hackers <evolution-hackers gnome org>
Subject: Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all
Date: Mon, 05 Jan 2009 09:41:13 -0500

Philip Van Hoof wrote:
> On Mon, 2009-01-05 at 08:25 -0500, Jeffrey Stedfast wrote:
>
>   
>> migrating away from the IMAP specific data cache would be good.
>>     
>
> Yes. I think IMAP and the local providers are the only ones that are
> still using a specialized datacache.
>
> The IMAP4 one, for example, ain't using a specialized one.
>
>   
>>>> b) migrate away the mbox data cache (the all-in-one file crap)
>>>>     
>>>>         
>>> I'm all for it. Once I thought of doing this, but the options were like
>>> Maildir or a format of one mbox file per mail in a distributed folder
>>> [CamelDataCache sort of format, like imap4/GW/Exchange]. But IIRC Fejj,
>>> had some concern like, Local still might be good to be held in a
>>> 'standards' way. I know it hurts us on expunge/mailbox rewrite etc.
>>>   
>>>       
>> what mbox data cache? CamelDataCache would probably be the best cache to
>> use for IMAP.
>>     
>
> Although I would change CamelDataCache to store individual MIME parts as
> separate files instead of files that look like a single-mail MBox file.
>   
it's really just the raw message/rfc822 format, not really mbox -
there's no "From " line for example.

that doesn't need to be part of the cache logic. that can be part of the
key.

> I would also decode the separate MIME parts before storing if the
> original E-mail had them encoded (which is usually the case, and always
> for binary attachments). This to make it more easy for metadata engines
> to index the MIME parts, and to allow such to do this efficiently. 
>
> Perhaps also to reduce disk-space, as encoded consumes more disk-space,
> but that is for me just a nice side-effect.
>
> So my format would create a directory foreach E-mail, or prefix each
> MIME part with the uid. Perhaps
>
> INBOX/subfolders/temp/1.              // headers+multipart container
> INBOX/subfolders/temp/1.1             // multipart container
> INBOX/subfolders/temp/1.1.1           // text/plain
> INBOX/subfolders/temp/1.1.2           // text/html
> INBOX/subfolders/temp/1.2.1           // inline JPeg attachment
> INBOX/subfolders/temp/1.BODYSTRUCTURE // Bodystructure of the E-mail
> INBOX/subfolders/temp/1.ENVELOPE      // Top envelope of the E-mail
>   

sure, this can be done with the key tho. instead of using the uid as the
key, use uid.1 or uid.1.2 etc

> ps. Perhaps I would store 1.BODYSTRUCTURE in the database instead. I
> would probably store 1.ENVELOPE in the database (like how it is now).
>   
yea, I think it makes sense to store BODYSTURCTURE in the folder summary.

> I would probably on top of storing BODYSTRUCTURE and ENVELOPE in the
> database also store them in separate files. Even if most filesystems
> will consume 4k or more (sector or block size) for those mini files.
>
> To get the JPeg attachment:
>
> $ cp INBOX/subfolders/temp/1.2.1 ~/mommy.jpeg
>
> $ exif INBOX/subfolders/temp/1.2.1
> EXIF tags in 'INBOX/subfolders/temp/1.2.1' ('Intel' byte order):
> --------------------+----------------------------------
> Tag                 |Value                                                     
> --------------------+----------------------------------
> Image Description   |Mommy with cake at birthday 
> Manufacturer        |SONY                                                      
> Model               |DSC-T33                                                   
> ...
>
> $ tracker-search -s EMails birthday
> Results:
>   email://user server/INBOX/temp/1
>   email://user server/INBOX/temp/1#2.1
>   ~/mommy.jpeg
>
>
> [CUT]
>
>   
>> this can cause problems if you need to verify signed parts because
>> re-encoding them might not result in the same output.
>>     
>
> Ok, for signatures I guess we can make an exception and keep then
> encoded in their original format then.
>
>   
>>>> For Maildir I recommend wasting diskspace by storing both the original
>>>> Maildir format and in parallel store the attachments separately.
>>>>
>>>> Maildir ain't accessible by current Evolution's UI, by the way.
>>>>
>>>> For MBox I recommend TO STOP USING THIS BROKEN FORMAT. It's insane with
>>>> today's mailboxes that easily grow to 3 gigabytes in size per user.
>>>>     
>>>>         
>>> I second your thoughts for MBox stuff. 
>>>   
>>>       
>> Eh, I think mbox works fine but I can understand wanting to move to
>> Maildir which is also fine :-)
>>     
>
> Maildir doesn't store individual MIME parts separately. So Mailbox is
> equally hard to handle for metadata engines as MBox is. Only difference
> with MBox is that we need to seek() to some location.
>
> So Maildir doesn't make it possible for us to let app developers
> implement indexing plugins easily, like a typical exif extractor.
>   

I guess, but they could just link with gmime or camel :p

Jeff

Follow-Ups:
- Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all
  - From: Philip Van Hoof

References:
- [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all
  - From: Philip Van Hoof
- Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all
  - From: Srinivasa Ragavan
- Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all
  - From: Jeffrey Stedfast
- Re: [Evolution-hackers] A Camel API to get the filename of the cache, also a proposal to have one format to rule them all
  - From: Philip Van Hoof

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]