Re: [Off-Topic] 1st proposal for GMime error reporting



Hi Jeff:

Thanks a lot for your feedback!

Am 22.10.17 16:54 schrieb(en) Jeffrey Stedfast via balsa-list:
I think, though, that proxying stuff back to the GMimeParser would get ugly quickly, so it probably does make 
more sense to have a callback on GMimeParserOptions like you've done.

Actually, I don't see a real use case for feeding much information back into the parser.  IMO, the GMime use 
cases in this context are
• parser for a MUA, as it is used now in Balsa, which works just perfectly
• “message validator”, for which the patch can be used, either stand-alone or as extension for the MUA
• /maybe/ a “message sanitiser” for re-writing the message to a sane format (see below)

I think your patch seems reasonable for a start. I would prefer "Callback" over "CB" to be consistent with glib/gtk naming 
conventions and maybe use the terms "Warn/Warning" instead of "Issue/Notify/Error", but those are minor nitpicks ;-)

Yes, will do…  I also have to change the Eclipse formatter as to meet your coding style.  Do you have 
guidelines, or even a Eclipse/uncrustify/indent/… rule set?

As far as the actual logic of the patch goes, I noticed that when a header exists already, you don't still 
add the header anyway. I couldn't tell at a glance whether you did the same with parameters, but I don't 
think that this patch should change behavior - it should only add a mechanism for alerting the application of 
potential issues that may arise when interpreting the MIME.

I basically removed the headers because GMime didn't scan the embedded multipart in my “evil” example (maybe 
it does after switching the order of the duplicated Content-Type headers).  Of course, this should *not* be 
the standard behaviour!

As I noted in my (broken) 1st message, it might be an idea to make it configurable through 
GMimeParserOptions, though:
• By default, the extra headers are kept, and the only callback is invoked.
• In “drop extra headers” mode, GMime could be used to re-write a suspicious message into a well-defined 
format (see use case #3 above; this is what some security appliances do).  We know that we may lose some 
information in this case, but we also know *exactly* how it will be interpreted by the final MUA, so we can 
scan for other threats (like viruses).

Now, *perhaps*, the callback could return some sort of value to the inner workings of the parser logic that 
would determine if the header/parameter/etc should be dropped, but it shouldn't happen without application 
feedback because that would alter the message, breaking the work I did to make sure that parsing a message 
and then writing it back to a stream should produce an exact byte-for-byte copy of the original message 
stream by default.

The return value from the callback (a gboolean would be sufficient) is an interesting idea.  However, as 
outlined above, the sanitiser is a /very/ special (and controversial, as it may actually change the contents 
which might even be illegal).  Thus, thinking again about it, I guess it's better to omit this feature for 
the first approach.  And it could be added later easily if someone *really* needs it.

The only gotchas that I can think of with regards to a return value from the callback is that the callback 
may need more context than what we can easily provide right now and that different issues/warnings may 
require different sets of return values (which may make some return values have undefined behavior depending 
on the context). We might need to try to figure out what sort of actions make sense for each warning 
condition and see if they are consistent.

I think it is possible to distinguish between stuff which is contrary to the intentions of MIME (like 
multiple contradictory Content-Types) and can be used for an attack, and minor standard violations which are 
otherwise harmless (like unencoded 8-bit chars or whitespace in RFC 2047 encoded-words).  I.e. keep the 
latter and if requested sanitize the former.  An error code would probably be sufficient to distinguish these 
cases.

(Note: what if you get multiple headers or parameters but want to only keep the last one found rather than 
the first?)

Good point.  Hmmm…

Another thing that I just thought of is that I've noticed that some clients with include the same parameters 2 times, once 
encoded using rfc2231 and once using the (incorrect-but-widely-used-anyway) rfc2047 encoding. In the "valid" cases, 
these values will be identical, it's just that the sending client is trying to be "helpful" if the receiving client 
doesn't support one form of encoding or the other. To be honest, I'm not sure this is really all that helpful, but I digress ;-)

Also a good point.  It's easy, though, to limit this feature to the “boundary” (and perhaps other?) parameter 
which shall not be present multiple times and may indicate an attack.

Anyway, I look forward to seeing your next patch. If you have a github account, using a PR would be really 
helpful too (if you don't have one, it's not a huge deal).

I don't have a Github account, but will get one if you prefer it.  As I am far from understanding your code 
in detail, a review of any changes would be great (i.e. necessary…), though.  Please don't expect a 
full-featured patch too early, as I mentioned, this is not the work I'm paid for (unfortunately)…

Cheers,
Albrecht.

Attachment: pgpMbTXXYnVEb.pgp
Description: PGP signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]