Re: Balsa 1.1.7 having trouble w/ mbox parsing?



On Tue, 17 July 01:36 Julian M Catchen wrote:
> > > and might i add, it's rather prone *remote exploits* by malicious
> > > malformated mime messages
> > 
> > Yup!
> > 
> > I've had a few goes at writing a streaming MIME decoder in the past and
> > gave up on each occasion when I contemplated how much easier the "decode
> > to a file and then decode that" approach is.
> > 
> > Brian
> 
> I wrote a similar set of functions to read MIME parts.  It uses a recursive
> algorithm to make its way through the MIME message, building a linked list
> structure.  Each time it reaches a new MIME boundary, it simply calls
> itself again with the new boundary, passing its current location as a
> call-by-reference parameter.  I think it works pretty well, although I
> haven't fed it too many malformed messages.

Pretty much what I did.  That much is easy.  In fact if you aren't too
bothered about behaviour in the face of invalid input, a streaming MIME
parser is a doddle.  The real difficulty starts when trying to handle
mis-nesting or missing boundaries in accordance with the rule that a
boundary terminates all parts contained within the part "owning" the
boundary.  Any mis-nested boundaries would then become part of the enclosing
part's data, presumably causing parsing errors.

The other difficulty in scanning for boundaries is that the CRLF immediately
preceeding the -- is part of the boundary as is the CRLF immediately after
it.  This means a MIME scanner should work by reading a character at a time
and not a line at a time.  The difficulties are compounded by the fact that
programs on Unix systems frequently convert the CRLF to \n for storage in
files.  This is stupid IMO, it *really* screws up MD5 hashes of MIME parts,
or multipart/signed data because it invalidates signatures.  Although
RFC 2822 does not permit bare CR or LF in a line, RFC 822 does; so the Unix
newline <> CRLF translation cannot be applied reversibly.  A moment's thought
will probably reveal other reasons not to process mail with Unix line endings.
It also means the Berkeley mbox format is even more problematic than you
thought it was.

Anyway, you know all that, so I'll shut up now.

> What kind of remote exploits do you see possible with this "streaming"
> scheme?

Most likely because tricky implementations are harder to get right, on a
streaming parser it will be easier to force things like buffer overrun
etc.

Brian




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]