Re: [gmime-devel] g_mime_object_get_header_list on first part in g_mime_message_foreach()?



On 7/14/2016 4:55 AM, Daniel Kahn Gillmor wrote:
Hi Jeffrey--

On Thu 2016-07-14 04:29:59 +0200, Jeffrey Stedfast wrote:
On 7/13/2016 12:14 PM, Daniel Kahn Gillmor wrote:
When i pass a GMimeMessage object to g_mime_message_foreach(), it
invokes the callback on a series of GMimePart objects, the first of
which is the top-level message itself.  but this object is actually
pretty strange:

   a) when i call g_mime_object_get_headers() on it, i get the full list of
      headers in a text blob.
The top-level part does this because it is kind of a hack in the sense
that the top-level part "temporarily" has a cache of all of the original
raw headers that were parsed because it has to keep them in the proper
order.

Once you remove the part from the message, it loses this cache and goes
back to just having its own headers.

the get_headers() method returns the cache (if it exists) and does not
re-serialize the headers. THe object itself does not contain the header
items.

   b) when i call gmime_object_get_header_list(), though, and loop through
      it with g_mime_object_get_header_list(), i only get the Content-Type
      header.
Correct.
so, to be clear, the message itself is seen as distinct from the
"top-level part" somehow?

In GMime's object tree, they are distinctly separate, but conceptually they are the same (if that makes sense).

   I've been conceptualizing them as the same
thing -- that is, that the top-level part *is* the message, which is why
it was surprising to me that the GMimeHeaderList of the top-level part
didn't have the same sequence as the message itself.

What happens during parsing is that the headers are split between the GMimeMessage object and the top-level mime-part object (whether it be a GMimePart, GMimeMultipart, etc).

All of the Content-* headers are filtered off to the top-level part and everything else is added to the message.

This makes it possible to replace the top-level mime part in the message and yet still retain the message headers.

If messages could only be parsed and not have child parts swapped out/removed/etc, then everything would probably just exist on the top-level mime part like you are conceptualizing.


For every object after the first in g_mime_message_foreach walk, these
queries return the same set.  So i'm pretty confused as to why it would
be different for the first part.

I note that if i invoke g_mime_object_get_header_list() directly on the
the GMimeMessage object, i get a GMimeHeaderList that contains all the
headers, not just the Content-Type.  Should i be understanding the
object differently somehow, or is this a bug?
It's a consequence of the way things work in order to maintain original
header orderings.

It's fixable with a re-deign of the header APIs (e.g. the way I handle
it in MimeKit), but that requires API breakage that I'm not sure is
worth the price to be paid at this point.
Can you describe what you think the fix would look like?

So basically the way MimeKit works is that, again, the headers are split like in GMime, but the difference is in the way the raw message headers are cached.

GMime's parser unfolds the headers as it parses them, MimeKit's parser does not.

GMime's parser calls g_mime_object_add_header() where it passes a field and an unfolded value for each of the parsed headers where eventually they get passed to g_mime_header_add() which creates a GMimeHeader object and inserts it into a list.

MimeKit retains the original raw header as well as an unfolded version of the header and the stream offset.

GMime's cache is a GMimeStreamMem that is constructed as the parser parses the headers and contains the entire header block in its raw form.

When GMime serializes a message or mime object, it uses said cache instead of re-serializing each header in the object so that it can be assured that none of the headers are folded or encoded in a different way than the original object.

Obviously if you add, remove, or modify any of the headers on said objects, the cache has to be destroyed and so after that point, all bets are off as far as the re-serialized headers being identical to the original (in formatting).

What MimeKit does when you re-serialize is that it merges the headers back again by semi-sorting based on the header offsets.

You can see the logic here:

MimeMessage.WriteTo(): https://github.com/jstedfast/MimeKit/blob/master/MimeKit/MimeMessage.cs#L1079 MimeMessage.MergeHeaders(): https://github.com/jstedfast/MimeKit/blob/master/MimeKit/MimeMessage.cs#L2135

It's a little gross, but it works and allows much better retention of the original content when re-serializing.


PS i also see the same oddness when looping into an encrypted part using
     g_mime_multipart_encrypted_decrypt() with a GMimeGpgContext -- the
     returned GMimeObject has the same behavior as the first visited part
     in the g_mime_message_foreach() walk.
I don't quite understand what you mean by this...

What additional headers does it have that it "shouldn't"?
i mean the other way around -- the GMimeHeaderList extracted from the
object returned by g_mime_multipart_encrypted_decrypt() only has
content-type in it.  I think it *doesn't* have the additional headers
that i'd expect it to have.

Normally Content-* headers are the only headers used for MIME parts.

I thought GMime's parser only split the headers in the case where it was the top-level mime part of a GMimeMessage, though... so I thought this should works as expected.

If not, it's a bug.

Jeff


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]