Re: [xml] problem with gzip decoding in nanohttp



Liron wrote:
You're absolutely right about the "don't fix it if it ain't broken"
approach
but after reviewing the code I think that it's badly broken and I'll
explain:
It's true that those functions do different things but the common thing for
all of them is reading from a socket. As I explained, the bug I found will
occur in every function that reads content (as opposed to headers) from the
socket and not using "inflate" when the content is gzip encoded. The only
function that decompresses the content is xmlNanoHTTPRead BUT it's not the
only function that delivers content.

This was an oversight on my part.

xmlNanoHTTPFetch is a function that opens a URL and saves the content to
file, if you're telling me that this file should contain encoded data in
case that gzip is used then we don't have a problem (I think it's wrong but
if it's by design...). But if you do expect the output file to contain html
for example then it won't work.

It probably should, I think.

Like I said, I didn't review the code enough to make the changes (or get
permission to do so yet ;)) but it looks like xmlNanoHTTPRecv and ONLY
xmlNanoHTTPRecv should be changed to support the decompression since every
function that reads from the socket calls it (but not every function that
calls it decompresses the content)

I don't like the idea of pushing this functionality into
xmlNanoHTTPRecv(), because it would increase memory usage for all
internal users of nanohttp. At the moment, the xmlNanoHTTPCtxt structure
holds the compressed content of the response, and for most purposes
xmlNanoHTTPRead() expands this content, chunk by chunk.
With this change, the expanded content needs to be stored. In addition,
a buffer of 4Kb would need to be kept available to pass to recv() -
probably stored in the context.

With this in mind, I think it would be preferable to keep the current
behaviour in xmlNanoHTTPRead(), and add similar functionality into
xmlNanoHTTPFetch() or xmlNanoHTTPFetchContent().

Another issue I wanted to ask you about is the content-length when the
content is compressed. If content-length header exists it'll return the
length of the compressed data so I applied a new function that simply
returns whether the content is gziped or not. This is good for applications
that depends on the content-length header for buffer allocations so they
should be aware if this field contains the right number or not (Just a
helper function).

External users should loop around xmlNanoHTTPRead() until it returns a
negative value - and this will do the right thing both for gzipped and
non-gzipped resources.

I'm not sure that users would need the helper function - they still need
to write this loop for gzipped content anyway, since we can't predict
the eventual size until inflation is complete. I'm not sure how to flag
incorrect usage to applications - but I don't think this function will
help...

Gary.

Attachment: signature.asc
Description: OpenPGP digital signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]