Re: [gmime-devel] g_mime_object_get_header_list on first part in g_mime_message_foreach()?
- From: Jeffrey Stedfast <fejj gnome org>
- To: Daniel Kahn Gillmor <dkg fifthhorseman net>, gmime-devel-list gnome org
- Subject: Re: [gmime-devel] g_mime_object_get_header_list on first part in g_mime_message_foreach()?
- Date: Thu, 14 Jul 2016 06:58:29 -0400
On 7/14/2016 4:55 AM, Daniel Kahn Gillmor wrote:
Hi Jeffrey--
On Thu 2016-07-14 04:29:59 +0200, Jeffrey Stedfast wrote:
On 7/13/2016 12:14 PM, Daniel Kahn Gillmor wrote:
When i pass a GMimeMessage object to g_mime_message_foreach(), it
invokes the callback on a series of GMimePart objects, the first of
which is the top-level message itself. but this object is actually
pretty strange:
a) when i call g_mime_object_get_headers() on it, i get the full list of
headers in a text blob.
The top-level part does this because it is kind of a hack in the sense
that the top-level part "temporarily" has a cache of all of the original
raw headers that were parsed because it has to keep them in the proper
order.
Once you remove the part from the message, it loses this cache and goes
back to just having its own headers.
the get_headers() method returns the cache (if it exists) and does not
re-serialize the headers. THe object itself does not contain the header
items.
b) when i call gmime_object_get_header_list(), though, and loop through
it with g_mime_object_get_header_list(), i only get the Content-Type
header.
Correct.
so, to be clear, the message itself is seen as distinct from the
"top-level part" somehow?
In GMime's object tree, they are distinctly separate, but conceptually
they are the same (if that makes sense).
I've been conceptualizing them as the same
thing -- that is, that the top-level part *is* the message, which is why
it was surprising to me that the GMimeHeaderList of the top-level part
didn't have the same sequence as the message itself.
What happens during parsing is that the headers are split between the
GMimeMessage object and the top-level mime-part object (whether it be a
GMimePart, GMimeMultipart, etc).
All of the Content-* headers are filtered off to the top-level part and
everything else is added to the message.
This makes it possible to replace the top-level mime part in the message
and yet still retain the message headers.
If messages could only be parsed and not have child parts swapped
out/removed/etc, then everything would probably just exist on the
top-level mime part like you are conceptualizing.
For every object after the first in g_mime_message_foreach walk, these
queries return the same set. So i'm pretty confused as to why it would
be different for the first part.
I note that if i invoke g_mime_object_get_header_list() directly on the
the GMimeMessage object, i get a GMimeHeaderList that contains all the
headers, not just the Content-Type. Should i be understanding the
object differently somehow, or is this a bug?
It's a consequence of the way things work in order to maintain original
header orderings.
It's fixable with a re-deign of the header APIs (e.g. the way I handle
it in MimeKit), but that requires API breakage that I'm not sure is
worth the price to be paid at this point.
Can you describe what you think the fix would look like?
So basically the way MimeKit works is that, again, the headers are split
like in GMime, but the difference is in the way the raw message headers
are cached.
GMime's parser unfolds the headers as it parses them, MimeKit's parser
does not.
GMime's parser calls g_mime_object_add_header() where it passes a field
and an unfolded value for each of the parsed headers where eventually
they get passed to g_mime_header_add() which creates a GMimeHeader
object and inserts it into a list.
MimeKit retains the original raw header as well as an unfolded version
of the header and the stream offset.
GMime's cache is a GMimeStreamMem that is constructed as the parser
parses the headers and contains the entire header block in its raw form.
When GMime serializes a message or mime object, it uses said cache
instead of re-serializing each header in the object so that it can be
assured that none of the headers are folded or encoded in a different
way than the original object.
Obviously if you add, remove, or modify any of the headers on said
objects, the cache has to be destroyed and so after that point, all bets
are off as far as the re-serialized headers being identical to the
original (in formatting).
What MimeKit does when you re-serialize is that it merges the headers
back again by semi-sorting based on the header offsets.
You can see the logic here:
MimeMessage.WriteTo():
https://github.com/jstedfast/MimeKit/blob/master/MimeKit/MimeMessage.cs#L1079
MimeMessage.MergeHeaders():
https://github.com/jstedfast/MimeKit/blob/master/MimeKit/MimeMessage.cs#L2135
It's a little gross, but it works and allows much better retention of
the original content when re-serializing.
PS i also see the same oddness when looping into an encrypted part using
g_mime_multipart_encrypted_decrypt() with a GMimeGpgContext -- the
returned GMimeObject has the same behavior as the first visited part
in the g_mime_message_foreach() walk.
I don't quite understand what you mean by this...
What additional headers does it have that it "shouldn't"?
i mean the other way around -- the GMimeHeaderList extracted from the
object returned by g_mime_multipart_encrypted_decrypt() only has
content-type in it. I think it *doesn't* have the additional headers
that i'd expect it to have.
Normally Content-* headers are the only headers used for MIME parts.
I thought GMime's parser only split the headers in the case where it was
the top-level mime part of a GMimeMessage, though... so I thought this
should works as expected.
If not, it's a bug.
Jeff
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]