[xml] debugging outputbuffer code in libxml



(I have reposted this to libxml since this is the code that needs to be
fixed, original discussion started on libxslt mailing list, i.e "Re:
[xslt] xsltRunStylesheet() with IOBuf" thread)

Ok, I downloaded the source code for libxml 2.5.4 and built a test
program to test the outputbuffer code on windows. A couple of notes,
there seems to be a problem with the tar.gz file when using WinZip
8.1-SR1, it keeps complaining about some weird import_0_* patterns in
the tar file (could be a winzip problem). 

Also when I tried compiling using MS Visual Studio 6.0 and opened the
dsp/dsw files from the dsp directory I found that there is an include
file missing from the distribution "xmlwin32version.h" and there are a
couple of c files missing from the project (got unresolved symbols when
building). 

But this is not even the most interesting part, I started debugging the
libxml2 and noticed a couple of things that I found to be rather
dangerous. Allow me to explain, here is the output of my test program:
parsing nstate.xml file
parsed nstate.xml file initializing the output buffer
encoder found -> UTF-8
created output-buffer
wrote bucket (len 8002 bytes)
wrote bucket (len 4015 bytes)
wrote bucket (len 4015 bytes)
wrote bucket (len 4001 bytes)
wrote bucket (len 4014 bytes)
wrote bucket (len 4011 bytes)
wrote bucket (len 4004 bytes)
wrote bucket (len 4000 bytes)
wrote bucket (len 4001 bytes)
wrote bucket (len 4002 bytes)
wrote bucket (len 4006 bytes)
wrote bucket (len 4001 bytes)
wrote bucket (len 4004 bytes)
wrote bucket (len 4003 bytes)
wrote bucket (len 4007 bytes)
wrote bucket (len 1876 bytes)
wrote bucket (len 0 bytes)

the note of interest here is the len values. Originally "buffer" member
is allocated to 4000 size buffer and "conv" member is allocated 4002. We
use the "conv" buffer because we have an encoder. Now if you notice that
in most of the cases the function was called we were going outside of
the allocated memory. (should be <=4002) 

One interesting thing I noticed when debugging was in encoding.c
xmlCharEncOutFunc, or to be more specific the following code:

retry:
    
    written = out->size - out->use;

    /*
     * First specific handling of in = NULL, i.e. the initialization call
     */
    if (in == NULL) {
        toconv = 0;
        if (handler->output != NULL) {
            ret = handler->output(&out->content[out->use], &written,
                                  NULL, &toconv);
            out->use += written;
            out->content[out->use] = 0;
        }

First of all written is never checked, and the important case here is if
written = 0 ( actually it should be more of "towrite") then we should
not be doing any conversions since the buffer is already full.

On the next point from the above:
            out->use += written;
where written = 4002, out->use before the call is 0 and then later 4002
and doing
    out->content[out->use] = 0;
should generate a segmentation fault! since we have memory allocated for
4002 bytes which means that we can only address 0..4001

now there's more interesting stuff:
ret = handler->output(&out->content[out->use], &written,
                                  NULL, &toconv);
actually calls UTF8toUTF8 which returns -1 because of the following code:
 if ((out == NULL) || (inb == NULL) || (outlen == NULL) || (inlenb == NULL))
        return(-1);

The return value is never checked and we assume that it worked (we
increase the use by written). Actually I am not even sure why we have
that code in the first place since it really doesn't do anything (in
this case anyway). So here is my question, since I don't understand all
of the issues here. Do we really need to call handler->output with NULL
parameter, and if we do then we need to check ret and only then do the
following lines, we also need a check to make sure we don't run outside
of the allocated space.

G.F. aka Gena01
http://www.gena01.com



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]