Re: [gmime-devel] Parsing of invalid files



Hi Vitaliy,

On 06/10/2011 04:46 AM, Vitaliy Didik wrote:
Hello,

I have an issue with GMime. I'm not sure that this is a bug because what I'm trying to accomplish is probably out side if the scope of the library. Currently GMime considers any file that is supplied to parser as MIME message. Even jpeg images, zip archives, etc. All those files treated as MIME messages with text/plain content type.

Upon further investigation I've found that in function 'parser_step_headers' in 'gmime-parser.c' there is a check for invalid input characters but reaction on them is rather limited.
I'm talking about following code:

if (! valid) {
    if (priv->scan_from && (inptr - start) == 4
&& !strncmp (start, "From ", 5))
      goto next_message;

    if (priv->headers != NULL || *inptr == ':') {
       /* probably the start of the content,
        * a broken mailer didn't terminate the
        * headers with an empty line. *sigh* */
        goto content_start;
    }
}

If I supply binary file to parser then first 2 conditions are not met and despite valid flag still FALSE parser continues to process input and subsequently creates new message object.
Is it an intended behavior or indeed a bug?

I've added 2 lines in code above (before last bracket):
priv->state = GMIME_PARSER_STATE_ERROR;
return 0;
and it seems to fix my problems. But may be this is not a right solution and you can point me to more appropriate fix.

I'm not 100% sure this is "safe", but the more I look at that code, the more I think it should be okay to do something like this. I'll keep thinking about this for the weekend and probably commit something along these lines.

I originally didn't do this because I didn't want to error out ever under the assumption that it was best to do anything we could to parse the input stream as a message/mbox file.

If I do add a ERROR fix for this case, I may need to relax it a bit in the future (if I get any bug reports asking me to make the parser more liberal in what it accepts), but I think it should be ok for now.

One possible change to make it a bit more relaxed is maybe only return error if *inptr is a ctrl character? Could you check that for me? Just check if is_type (*inptr, IS_CTRL) returns non-zero.

Jeff



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]