Re: [gmime-devel] Parsing of invalid files



Hello Jeffrey,
 
I'm not 100% sure this is "safe", but the more I look at that code, the more I think it should be okay to do something like this.  I'll keep thinking about this for the weekend and probably commit something along these lines.

I originally didn't do this because I didn't want to error out ever under the assumption that it was best to do anything we could to parse the input stream as a message/mbox file.

I agree. I suspect majority of the library users use it for mail applications (or something similar). And it is better to be more liberal to input files for such apps.
My purpose is a bit different and require quite strict filter for MIME/non-MIME files so I opted to return error in all cases.
 

If I do add a ERROR fix for this case, I may need to relax it a bit in the future (if I get any bug reports asking me to make the parser more liberal in what it accepts), but I think it should be ok for now.

One possible change to make it a bit more relaxed is maybe only return error if *inptr is a ctrl character? Could you check that for me? Just check if is_type (*inptr, IS_CTRL) returns non-zero.

I've tried to add this condition and got following result:
- fully binary files (graphic images, archives, disk images etc) are filtered out,
- files with plain text content are parsed as MIME messages in most cases,
- structured text files (shell scripts, xml, etc) are filtered out on LF or CR characters,
- some binary files which starts with text and several consecutive LF characters recognized as MIME files
(e.g. MSVC *.ilk files, that starts with "Microsoft Linker Database<LF><LF>" line which becomes header).
 
Best regards,
Vitaliy


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]