[xml] interaction between xmlSetGenericErrorFunc and xmlSetStructuredErrorFunc can crash (draft patch available)
- From: Wang Lam <wlam kosmix com>
- To: xml gnome org
- Subject: [xml] interaction between xmlSetGenericErrorFunc and xmlSetStructuredErrorFunc can crash (draft patch available)
- Date: Mon, 27 Jul 2009 20:05:37 -0700
Hi! I came across some (unfortunate) behavior in libxml2 (2.7.3
and head) error handling that I want to run by you for your comments.
I have a patch that may help one aspect of it, if you are interested.
For background: In trying to capture errors from stderr, I encountered
a curious crash. I reproduce the behavior in the following example:
// Example error-making-and-capturing code, with C99/C++-style comments
// added for the commentary that follows.
//
#include <libxml/xmlerror.h>
#include <libxml/xmlsave.h>
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE (1024)
char error_buffer[BUFFER_SIZE] = { 0 };
void write_buffer(void *ctx, xmlErrorPtr p)
{
snprintf(ctx, BUFFER_SIZE, "Structured error %s\n", p->message);
}
int main()
{
LIBXML_TEST_VERSION
const xmlChar xml_version[] = "1.0", n[] = "name", bad_text[] =
"\x01";
xmlSetStructuredErrorFunc(error_buffer, write_buffer); // 1
xmlDocPtr doc = xmlNewDoc(xml_version);
xmlNodePtr node1 = xmlNewNode(NULL, n);
xmlNodeAddContent(node1, bad_text); // 2
xmlDocSetRootElement(doc, node1);
xmlSaveCtxtPtr save = xmlSaveToFd(1, NULL, XML_SAVE_FORMAT);
xmlSaveTree(save, node1); // 3
xmlSaveClose(save);
xmlCleanupParser();
if (strlen(error_buffer)) {
fprintf(stderr, "Error caught: %s", error_buffer);
return 1;
} else {
return 0;
}
}
// 1
Based on the API description and mailing-list messages--
[Re: [xml] libxml error redirection]
http://mail.gnome.org/archives/xml/2006-April/msg00090.html
[Re: [xml] Error handling]
http://mail.gnome.org/archives/xml/2005-May/msg00134.html
--this function (xmlSetStructuredErrorFunc) seems to be the preferable
way to capture errors.
// 2
This call does not seem to trigger any error callback, but does
represent an actual error introduced into document doc: bad_text
is not a character in XML Char, so a well-formed XML document
cannot add (contain) it.
// 3
Here I get a crash:
$ gcc -Wall -O0 -g `pkg-config xml2 --cflags --libs` test.c
$ ./a.out
Segmentation fault (core dumped)
$ gdb a.out core.28657
GNU gdb Fedora (6.8-29.fc10)
:
Reading symbols from /usr/lib64/libxml2.so.2...done.
Loaded symbols for /usr/lib64/libxml2.so.2
:
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
[New process 28657]
:
(gdb) bt
#0 0x00000037200fa8f9 in __vfprintf_chk () from /lib64/libc.so.6
#1 0x000000372c8328f8 in xmlGenericErrorDefaultFunc ()
from /usr/lib64/libxml2.so.2
#2 0x000000372c913ba4 in ?? () from /usr/lib64/libxml2.so.2
#3 0x000000372c860415 in xmlOutputBufferWriteEscape ()
from /usr/lib64/libxml2.so.2
#4 0x000000372c9162e3 in ?? () from /usr/lib64/libxml2.so.2
#5 0x000000372c915bcb in ?? () from /usr/lib64/libxml2.so.2
#6 0x000000372c9165a3 in xmlSaveTree () from /usr/lib64/libxml2.so.2
#7 0x0000000000400a6c in main () at test.c:26
Looking into this crash, I notice two details in particular:
(a) xmlGenericError still gets called, which seems surprising and
unfortunate in light of the xmlSetStructuredErrorFunc call.
Apparently, the example code happens to hit an explicit invocation
of xmlGenericError in xmlsave.c:xmlEscapeEntities (about line 300):
} else {
xmlGenericError(xmlGenericErrorContext,
"xmlEscapeEntities : char out of range\n");
in++;
goto error;
}
I don't know how many other similar invocations (logic that ends up
calling xmlGenericError instead of xmlStructuredError when the
latter is set) remain in the source tree. For example, is--
[Re: [xml] xpath and structured errors]
http://mail.gnome.org/archives/xml/2004-May/msg00003.html
--another (i.e., distinct) instance of this behavior?
This particular item--consistently respecting a choice of "generic"
or "structured" errors--I must leave to someone better able to
write, review, and test the appropriate changes: The necessary
changes may be pervasive (to be final), and touch parts of the
library I haven't used (and hence I could break without noticing):
$ fgrep -c 'xmlGenericError(' *.c | grep -v :0 | head
catalog.c:51
debugXML.c:63
dict.c:1
DOCBparser.c:10
encoding.c:23
error.c:12
globals.c:1
hash.c:1
HTMLparser.c:62
legacy.c:15
$ fgrep -c 'xmlGenericError(' *.c | grep -v :0 | wc -l
42
(b) Worse than being invoked, though, the default xmlGenericError
crashed (instead of, say, printing to stderr).
From what I see, xmlGenericErrorFunc and xmlStructuredErrorFunc
share a single xmlGenericErrorContext; setting one function
overwrites the other function's context. In the above example
code, the default xmlGenericErrorDefaultFunc expects its context
to be a FILE * (or NULL, to default to stderr), and my custom
xmlStructuredErrorFunc expects a pointer to a writable error-string
buffer (for example).
Locally, I work around the problem by writing both error-handling
functions (and setting them both together) so that they can share
one context. It's also possible, though, to create a separate
void * xmlStructuredErrorContext so that the two functions don't
have to fight over the same state.
With a patch to create xmlStructuredErrorContext, I get a different
result from the above example code:
$ LD_LIBRARY_PATH=/home/wlam/libxml2-git/libxml2/.libs ./a.out
xmlEscapeEntities : char out of range
<name></name>
$ echo $?
0
The xmlGenericErrorFunc is still invoked, but retains its state
(unperturbed by xmlSetStructuredErrorFunc) for proper operation.
While I'd naturally prefer (a) to be resolved, I don't know whether
it is easy or not (whether anybody is able to go through the code
to assert or verify its resolution). On the other hand, I do have
a draft patch lying around (relative to libxml head) for (b).
So, I wonder--
* Are (a) and (b) behaviors that libxml2's maintainers would also
like changed, or do they represent desired behavior for the system
(e.g., the system is "supposed to" become undefined in // 2)?
Further, if people want to change (a) or (b)--
* Should I file a bug entry for (a)? (Does somebody expect to be able
to do something about it, or would a bug entry just add clutter?)
Solving (a) also solves (b): If only one set function ever gets
called, there's no conflation of multiple functions' context pointers.
* Should I file a bug entry and submit my draft patch for (b) (to create
a new void *xmlStructuredErrorContext)? (It's about 451 lines, so
I don't have it attached to this already-long message, but as you'd
quickly see, most of those lines are just side effects of libxml2's
internal code generation.)
Your thoughts or suggestions would be welcome. Thanks!
Wang Lam
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]