[xml] interaction between xmlSetGenericErrorFunc and xmlSetStructuredErrorFunc can crash (draft patch available)




Hi!  I came across some (unfortunate) behavior in libxml2 (2.7.3
and head) error handling that I want to run by you for your comments.
I have a patch that may help one aspect of it, if you are interested.

For background:  In trying to capture errors from stderr, I encountered
a curious crash.  I reproduce the behavior in the following example:

// Example error-making-and-capturing code, with C99/C++-style comments
// added for the commentary that follows.
//
#include <libxml/xmlerror.h> #include <libxml/xmlsave.h> #include <stdio.h> #include <string.h>

#define BUFFER_SIZE (1024)
char error_buffer[BUFFER_SIZE] = { 0 };
void write_buffer(void *ctx, xmlErrorPtr p)
{
        snprintf(ctx, BUFFER_SIZE, "Structured error %s\n", p->message);
}

int main()
{
        LIBXML_TEST_VERSION
const xmlChar xml_version[] = "1.0", n[] = "name", bad_text[] = "\x01";

        xmlSetStructuredErrorFunc(error_buffer, write_buffer);  // 1

        xmlDocPtr doc = xmlNewDoc(xml_version);
        xmlNodePtr node1 = xmlNewNode(NULL, n);
        xmlNodeAddContent(node1, bad_text);                     // 2
        xmlDocSetRootElement(doc, node1);

        xmlSaveCtxtPtr save = xmlSaveToFd(1, NULL, XML_SAVE_FORMAT);
        xmlSaveTree(save, node1);                               // 3
        xmlSaveClose(save);
        xmlCleanupParser();

        if (strlen(error_buffer)) {
                fprintf(stderr, "Error caught: %s", error_buffer);
                return 1;
        } else {
                return 0;
        }
}


// 1
Based on the API description and mailing-list messages--

        [Re: [xml] libxml error redirection]
        http://mail.gnome.org/archives/xml/2006-April/msg00090.html

        [Re: [xml] Error handling]
        http://mail.gnome.org/archives/xml/2005-May/msg00134.html

--this function (xmlSetStructuredErrorFunc) seems to be the preferable
way to capture errors.

// 2
This call does not seem to trigger any error callback, but does
represent an actual error introduced into document doc:  bad_text
is not a character in XML Char, so a well-formed XML document
cannot add (contain) it.

// 3
Here I get a crash:

$ gcc -Wall -O0 -g `pkg-config xml2 --cflags --libs` test.c
$ ./a.out
Segmentation fault (core dumped)

$ gdb a.out core.28657
GNU gdb Fedora (6.8-29.fc10)
:
Reading symbols from /usr/lib64/libxml2.so.2...done.
Loaded symbols for /usr/lib64/libxml2.so.2
:
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
[New process 28657]
:
(gdb) bt
#0  0x00000037200fa8f9 in __vfprintf_chk () from /lib64/libc.so.6
#1  0x000000372c8328f8 in xmlGenericErrorDefaultFunc ()
   from /usr/lib64/libxml2.so.2
#2  0x000000372c913ba4 in ?? () from /usr/lib64/libxml2.so.2
#3  0x000000372c860415 in xmlOutputBufferWriteEscape ()
   from /usr/lib64/libxml2.so.2
#4  0x000000372c9162e3 in ?? () from /usr/lib64/libxml2.so.2
#5  0x000000372c915bcb in ?? () from /usr/lib64/libxml2.so.2
#6  0x000000372c9165a3 in xmlSaveTree () from /usr/lib64/libxml2.so.2
#7  0x0000000000400a6c in main () at test.c:26


Looking into this crash, I notice two details in particular:

(a) xmlGenericError still gets called, which seems surprising and
    unfortunate in light of the xmlSetStructuredErrorFunc call.

    Apparently, the example code happens to hit an explicit invocation
    of xmlGenericError in xmlsave.c:xmlEscapeEntities (about line 300):

        } else {
            xmlGenericError(xmlGenericErrorContext,
                "xmlEscapeEntities : char out of range\n");
            in++;
            goto error;
        }

    I don't know how many other similar invocations (logic that ends up
    calling xmlGenericError instead of xmlStructuredError when the
    latter is set) remain in the source tree.  For example, is--

        [Re: [xml] xpath and structured errors]
        http://mail.gnome.org/archives/xml/2004-May/msg00003.html

    --another (i.e., distinct) instance of this behavior?

    This particular item--consistently respecting a choice of "generic"
    or "structured" errors--I must leave to someone better able to
    write, review, and test the appropriate changes:  The necessary
    changes may be pervasive (to be final), and touch parts of the
    library I haven't used (and hence I could break without noticing):

$ fgrep -c 'xmlGenericError(' *.c | grep -v :0 | head
catalog.c:51 debugXML.c:63 dict.c:1 DOCBparser.c:10 encoding.c:23 error.c:12 globals.c:1 hash.c:1 HTMLparser.c:62 legacy.c:15

$ fgrep -c 'xmlGenericError(' *.c | grep -v :0 | wc -l
42

(b) Worse than being invoked, though, the default xmlGenericError
    crashed (instead of, say, printing to stderr).

    From what I see, xmlGenericErrorFunc and xmlStructuredErrorFunc
    share a single xmlGenericErrorContext; setting one function
    overwrites the other function's context.  In the above example
    code, the default xmlGenericErrorDefaultFunc expects its context
    to be a FILE * (or NULL, to default to stderr), and my custom
    xmlStructuredErrorFunc expects a pointer to a writable error-string
    buffer (for example).

    Locally, I work around the problem by writing both error-handling
    functions (and setting them both together) so that they can share
    one context.  It's also possible, though, to create a separate
    void * xmlStructuredErrorContext so that the two functions don't
    have to fight over the same state.

    With a patch to create xmlStructuredErrorContext, I get a different
    result from the above example code:

$ LD_LIBRARY_PATH=/home/wlam/libxml2-git/libxml2/.libs ./a.out
xmlEscapeEntities : char out of range
<name></name>

$ echo $?
0

    The xmlGenericErrorFunc is still invoked, but retains its state
    (unperturbed by xmlSetStructuredErrorFunc) for proper operation.


While I'd naturally prefer (a) to be resolved, I don't know whether
it is easy or not (whether anybody is able to go through the code
to assert or verify its resolution).  On the other hand, I do have
a draft patch lying around (relative to libxml head) for (b).

So, I wonder--

* Are (a) and (b) behaviors that libxml2's maintainers would also
  like changed, or do they represent desired behavior for the system
  (e.g., the system is "supposed to" become undefined in // 2)?

Further, if people want to change (a) or (b)--

* Should I file a bug entry for (a)?  (Does somebody expect to be able
  to do something about it, or would a bug entry just add clutter?)
  Solving (a) also solves (b):  If only one set function ever gets
  called, there's no conflation of multiple functions' context pointers.

* Should I file a bug entry and submit my draft patch for (b) (to create
  a new void *xmlStructuredErrorContext)?  (It's about 451 lines, so
  I don't have it attached to this already-long message, but as you'd
  quickly see, most of those lines are just side effects of libxml2's
  internal code generation.)

Your thoughts or suggestions would be welcome.  Thanks!

Wang Lam




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]