Liron wrote:
You're absolutely right about the "don't fix it if it ain't broken" approach but after reviewing the code I think that it's badly broken and I'll explain: It's true that those functions do different things but the common thing for all of them is reading from a socket. As I explained, the bug I found will occur in every function that reads content (as opposed to headers) from the socket and not using "inflate" when the content is gzip encoded. The only function that decompresses the content is xmlNanoHTTPRead BUT it's not the only function that delivers content.
This was an oversight on my part.
xmlNanoHTTPFetch is a function that opens a URL and saves the content to file, if you're telling me that this file should contain encoded data in case that gzip is used then we don't have a problem (I think it's wrong but if it's by design...). But if you do expect the output file to contain html for example then it won't work.
It probably should, I think.
Like I said, I didn't review the code enough to make the changes (or get permission to do so yet ;)) but it looks like xmlNanoHTTPRecv and ONLY xmlNanoHTTPRecv should be changed to support the decompression since every function that reads from the socket calls it (but not every function that calls it decompresses the content)
I don't like the idea of pushing this functionality into xmlNanoHTTPRecv(), because it would increase memory usage for all internal users of nanohttp. At the moment, the xmlNanoHTTPCtxt structure holds the compressed content of the response, and for most purposes xmlNanoHTTPRead() expands this content, chunk by chunk. With this change, the expanded content needs to be stored. In addition, a buffer of 4Kb would need to be kept available to pass to recv() - probably stored in the context. With this in mind, I think it would be preferable to keep the current behaviour in xmlNanoHTTPRead(), and add similar functionality into xmlNanoHTTPFetch() or xmlNanoHTTPFetchContent().
Another issue I wanted to ask you about is the content-length when the content is compressed. If content-length header exists it'll return the length of the compressed data so I applied a new function that simply returns whether the content is gziped or not. This is good for applications that depends on the content-length header for buffer allocations so they should be aware if this field contains the right number or not (Just a helper function).
External users should loop around xmlNanoHTTPRead() until it returns a negative value - and this will do the right thing both for gzipped and non-gzipped resources. I'm not sure that users would need the helper function - they still need to write this loop for gzipped content anyway, since we can't predict the eventual size until inflation is complete. I'm not sure how to flag incorrect usage to applications - but I don't think this function will help... Gary.
Attachment:
signature.asc
Description: OpenPGP digital signature