Re: [gmime-devel] Mapping MIME parts to byte offsets



Alex Hudson wrote:
> Hi everyone,
>
> Long time no see ;)
>
> I've been trying to use GMime to get various byte offsets into a mail
> - for example, where the content body of a attachment starts, how big
> it is, that kind of thing.
>
> To begin with, I had a look at how DBMail does it which (unless I'm
> following the code incorrectly) is something along the lines of using
> g_mime_parser_construct_message(), iterating over the various parts
> and doing some string building / seeking to work out the numbers. As
> well as not being terribly efficient, it also strikes me that this
> might potentially get the wrong answers because of the amount of
> regenerating data it seems to be doing.

hmmm, yea, that definitely doesn't sound like a nice way to do it. As
long as they don't modify the headers, though, they can probably use
such a hack since GMime keeps a raw header buffer after parsing each part.

>
> I think really what I want is basically what the GMime parser is
> holding internally, in the BoundaryStack, and maybe some pointers into
> a part looking at where headers start / end. I imagine this could
> probably be done with a bit of judicious _tell() here and there in the
> code, but thinking about the patch I'm not really sure how it would
> fit in.
>
> I was thinking that perhaps GMimeObject could be extended to have this
> information, but I'm also not sure if that's necessary - perhaps it
> could be done programatically. g_mime_object_to_string for example
> appears to create the string from the data stream, but again seems to
> be re-creating data I think?

I think I like the idea of having GMimeObject (or maybe GMimeHeaderList)
remember header offsets. That seems like it might be the simplest approach.

Another approach would be to fix the parser API to be more incremental
so that you could ask it for that information whenever it gets to a
HEADERS_START/END state, but that seems like a lot more work than just
adding it to the objects ;-)

As far as getting content offsets, that is relatively easy because those
offsets are already stored on the GMimeStreams that make up the content
objects. Although I suppose this doesn't really work for multiparts
since they don't have a content object...


GMimeDataWrapper *content = g_mime_part_get_content_object (part);
GMimeStream *stream = g_mime_data_wrapper_get_stream (content);

offsets are stream->bound_start and stream->bound_end

Of course, that also assumes you haven't disabled persistent streams in
the parser (which will cause it to dup everything to memory-backed streams).

I assume you need this info for being able to fulfill requests for FETCH
BODY[1.2.MIME] and FETCH BODY[1.2.TEXT] (or whatever the syntax is, it's
been a while since I hacked on IMAP stuff).

Let me know if the stream->bound_start/end offsets aren't enough for
what you need for the .TEXT (I can't remember what happens if you do
that for a multipart). I can probably hack up the header start/end
offsets in GMimeHeaderList (or look to see if there's a better way) this
weekend. I gotta figure out how to branch in git (GNOME's rcs just
changed from svn to git the other day and now I gotta learn a whole new
system!).

Jeff



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]