Re: [gmime-devel] g_mime_object_get_header_list on first part in g_mime_message_foreach()?

From: Jeffrey Stedfast <fejj gnome org>
To: Daniel Kahn Gillmor <dkg fifthhorseman net>, gmime-devel-list gnome org
Subject: Re: [gmime-devel] g_mime_object_get_header_list on first part in g_mime_message_foreach()?
Date: Thu, 14 Jul 2016 06:58:29 -0400

On 7/14/2016 4:55 AM, Daniel Kahn Gillmor wrote:

Hi Jeffrey--

On Thu 2016-07-14 04:29:59 +0200, Jeffrey Stedfast wrote:

On 7/13/2016 12:14 PM, Daniel Kahn Gillmor wrote:

When i pass a GMimeMessage object to g_mime_message_foreach(), it
invokes the callback on a series of GMimePart objects, the first of
which is the top-level message itself.  but this object is actually
pretty strange:

   a) when i call g_mime_object_get_headers() on it, i get the full list of
      headers in a text blob.

The top-level part does this because it is kind of a hack in the sense
that the top-level part "temporarily" has a cache of all of the original
raw headers that were parsed because it has to keep them in the proper
order.

Once you remove the part from the message, it loses this cache and goes
back to just having its own headers.

the get_headers() method returns the cache (if it exists) and does not
re-serialize the headers. THe object itself does not contain the header
items.

   b) when i call gmime_object_get_header_list(), though, and loop through
      it with g_mime_object_get_header_list(), i only get the Content-Type
      header.

Correct.

so, to be clear, the message itself is seen as distinct from the
"top-level part" somehow?

In GMime's object tree, they are distinctly separate, but conceptuallythey are the same (if that makes sense).

   I've been conceptualizing them as the same
thing -- that is, that the top-level part *is* the message, which is why
it was surprising to me that the GMimeHeaderList of the top-level part
didn't have the same sequence as the message itself.

What happens during parsing is that the headers are split between theGMimeMessage object and the top-level mime-part object (whether it be aGMimePart, GMimeMultipart, etc).

All of the Content-* headers are filtered off to the top-level part andeverything else is added to the message.

This makes it possible to replace the top-level mime part in the messageand yet still retain the message headers.

If messages could only be parsed and not have child parts swappedout/removed/etc, then everything would probably just exist on thetop-level mime part like you are conceptualizing.

For every object after the first in g_mime_message_foreach walk, these
queries return the same set.  So i'm pretty confused as to why it would
be different for the first part.

I note that if i invoke g_mime_object_get_header_list() directly on the
the GMimeMessage object, i get a GMimeHeaderList that contains all the
headers, not just the Content-Type.  Should i be understanding the
object differently somehow, or is this a bug?

It's a consequence of the way things work in order to maintain original
header orderings.

It's fixable with a re-deign of the header APIs (e.g. the way I handle
it in MimeKit), but that requires API breakage that I'm not sure is
worth the price to be paid at this point.

Can you describe what you think the fix would look like?

So basically the way MimeKit works is that, again, the headers are splitlike in GMime, but the difference is in the way the raw message headersare cached.

GMime's parser unfolds the headers as it parses them, MimeKit's parserdoes not.

GMime's parser calls g_mime_object_add_header() where it passes a fieldand an unfolded value for each of the parsed headers where eventuallythey get passed to g_mime_header_add() which creates a GMimeHeaderobject and inserts it into a list.

MimeKit retains the original raw header as well as an unfolded versionof the header and the stream offset.

GMime's cache is a GMimeStreamMem that is constructed as the parserparses the headers and contains the entire header block in its raw form.

When GMime serializes a message or mime object, it uses said cacheinstead of re-serializing each header in the object so that it can beassured that none of the headers are folded or encoded in a differentway than the original object.

Obviously if you add, remove, or modify any of the headers on saidobjects, the cache has to be destroyed and so after that point, all betsare off as far as the re-serialized headers being identical to theoriginal (in formatting).

What MimeKit does when you re-serialize is that it merges the headersback again by semi-sorting based on the header offsets.


You can see the logic here:

MimeMessage.WriteTo():https://github.com/jstedfast/MimeKit/blob/master/MimeKit/MimeMessage.cs#L1079MimeMessage.MergeHeaders():https://github.com/jstedfast/MimeKit/blob/master/MimeKit/MimeMessage.cs#L2135

It's a little gross, but it works and allows much better retention ofthe original content when re-serializing.

PS i also see the same oddness when looping into an encrypted part using
     g_mime_multipart_encrypted_decrypt() with a GMimeGpgContext -- the
     returned GMimeObject has the same behavior as the first visited part
     in the g_mime_message_foreach() walk.

I don't quite understand what you mean by this...

What additional headers does it have that it "shouldn't"?

i mean the other way around -- the GMimeHeaderList extracted from the
object returned by g_mime_multipart_encrypted_decrypt() only has
content-type in it.  I think it *doesn't* have the additional headers
that i'd expect it to have.


Normally Content-* headers are the only headers used for MIME parts.

I thought GMime's parser only split the headers in the case where it wasthe top-level mime part of a GMimeMessage, though... so I thought thisshould works as expected.


If not, it's a bug.

Jeff

Follow-Ups:
- Re: [gmime-devel] g_mime_object_get_header_list on first part in g_mime_message_foreach()?
  - From: Jeffrey Stedfast

References:
- [gmime-devel] g_mime_object_get_header_list on first part in g_mime_message_foreach()?
  - From: Daniel Kahn Gillmor
- Re: [gmime-devel] g_mime_object_get_header_list on first part in g_mime_message_foreach()?
  - From: Jeffrey Stedfast
- Re: [gmime-devel] g_mime_object_get_header_list on first part in g_mime_message_foreach()?
  - From: Daniel Kahn Gillmor

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]