Re: [gmime-devel] Suggestions on is_mbox_marker() function



Hi Kevin,

My comments are inline below:

On 10/22/2018 6:32 AM, chenkevin via gmime-devel-list wrote:
Hi,
 
Good day!
We recently encounter some problems for parsing eml files of mbox format,
and we would like to provide some suggestions on is_mbox_marker() function.
 
GMime 2.6.x library works well for the beginning line "From sender date ..." of the mbox formatted eml file.
However, for lines like ">>From sender date ...", it just failed to parse.

Yea, because that's not an mbox marker. Mbox markers may only start with "From " (i.e. From-space).

 
We have noticed that in GMime 3.x library, there's a function called is_mbox_marker() function.
It seems that it now can handle two types of mbox beginning line, including "From " and ">From ".
But for the other variants, it still results in parsing failed.

You are misunderstanding what that method is doing. It allows for checking ">From " because that is a common glob of text that might preface the headers of a message/rfc822 attachment. In other words:

[top-level message headers]
Content-Type: multipart/mixed; boundary="blahblah"

--blahblah
Content-Type: text/plain

This is the message body
--blahblah
Content-Type: message/rfc822

>From joe someplace org blah blah blah
Received: blah blah blah
[more message headers]


The is_mbox_marker() function is just trying to determine if the garbage that it just encountered might be something like the above snippet. It is not meant to decide if that is the start of a new message in the mbox file.


 
Here we find out some information about these variants:
(MBOX, MBOXO, MBOXRD, ...)

This documentation confirms what I've just been telling you and does not, afaict, say that ">>From" is a valid mbox marker.

 
Since there can be many '>' characters before the "From " string, we suggest to check these cases in is_mbox_marker() function.
For example, using a while loop to proceed the prefixing '>' characters instead of only advance one time.

No. This doesn't make any sense to do.

I would recommend reading this: https://www.jwz.org/doc/content-length.html

 
Because we are not that familiar with the implementation details of  GMime library,
we wrote this mail in order to provide this ideas, and we would like to know if this works for those mbox variants.

GMime's current mbox parser handles those above formats just fine. You just don't have a valid mbox file of any kind.


Hope that helps to clear things up,

Jeff



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]