[xml] libxml violates the zlib interface and crashes



libxml authors and/or maintainers,

I am preparing a new release of zlib which includes a rewrite of the gz* functions (to wit, that read and 
write gzip files).  In the process of testing, Mark Brown discovered that libxml encounters a segmentation 
fault with the current beta versions of zlib (1.2.3.5 and later).  Within libxml version 2.7.6 in xmlIO.c we 
find this offending code:

#ifdef HAVE_ZLIB_H
        if ((xmlInputCallbackTable[i].opencallback == xmlGzfileOpen) &&
                (strcmp(URI, "-") != 0)) {
            if (((z_stream *)context)->avail_in > 4) {
                char *cptr, buff4[4];
                cptr = (char *) ((z_stream *)context)->next_in;
                if (gzread(context, buff4, 4) == 4) {
                    if (strncmp(buff4, cptr, 4) == 0)
                        ret->compressed = 0;
                    else
                        ret->compressed = 1;
                    gzrewind(context);
                }
            }
        }
#endif

This appears to be an attempt to determine if the input is a gzip file or not.  It does so by finding and 
comparing bytes from an internal zlib buffer to the data returned by zlib, using "(z_stream *)context" to 
access the internal buffer, where "context" is zlib's otherwise opaque gzFile structure.

This code violates the zlib interface, penetrating the protected gzFile structure.  (Gee, this is starting to 
sound like a sexual assault.)  libxml assumes things about zlib that are not documented in the interface 
(zlib.h) and so cannot be relied upon and must not be used in code that professes to be dependable.  This 
includes the assumption that gzFile has the z_stream it uses in the structure as the first element, that it 
uses the z_stream next_in pointer for the input buffer, and that gzopen() or gzdopen() reads data from the 
input to determine whether it is a gzip file or not.  None of these assumptions are true in the new gz* code 
in the current beta of zlib, and so libxml crashes and burns.

It turns out that this offense is not necessary since the current production version of zlib, 1.2.3, 
available since July of 2005 and widely deployed, has the gzdirect() function for exactly this purpose.

Since both libxml and zlib are widely deployed, and since I expect that applications usually link to it and 
zlib dynamically, we have a problem.

The fix to libxml is to check the version of zlib, and to use the current code for ZLIB_VERNUM less than 
0x1230, and to instead use gzdirect() for ZLIB_VERNUM greater than or equal to 0x1230.  However, older 
versions of libxml without this fix will fail with newer versions of zlib.  E.g. if zlib is updated and 
libxml is not.

An untested patch to illustrate the approach is provided below.

Mark Adler


--- xmlIO-2.7.6.c       2009-09-24 08:32:00.000000000 -0700
+++ xmlIO.c     2010-01-18 01:30:27.000000000 -0800
@@ -2518,6 +2518,9 @@
 #ifdef HAVE_ZLIB_H
        if ((xmlInputCallbackTable[i].opencallback == xmlGzfileOpen) &&
                (strcmp(URI, "-") != 0)) {
+#if defined(ZLIB_VERNUM) && ZLIB_VERNUM >= 0x1230
+        ret->compressed = !gzdirect(context);
+#else
            if (((z_stream *)context)->avail_in > 4) {
                char *cptr, buff4[4];
                cptr = (char *) ((z_stream *)context)->next_in;
@@ -2529,6 +2532,7 @@
                    gzrewind(context);
                }
            }
+#endif
        }
 #endif
     }




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]