Re: [xml] xmlCreatePushParserCtxt and initial chunk size
- From: Gary Pennington <Gary Pennington uk sun com>
- To: Bill Moseley <moseley hank org>
- Cc: xml gnome org
- Subject: Re: [xml] xmlCreatePushParserCtxt and initial chunk size
- Date: Thu, 27 Sep 2001 10:07:13 +0100
Bill Moseley wrote:
The SAX examples show an initial small chunk size for determining the
encoding when calling xmlCreatePushParserCtxt(), and then reading in 1024
byte chunks when calling xmlParseChunk().
You must read at least 4 bytes. You mustn't read too many bytes or you
will overflow certain internal buffers in certain conditions, mainly
associated with error processing of badly formatted XML. The size at
which this buffer overflow will cause a core dump varies with the
version of libxml you are using, I found it was 79 bytes in 2.4.0, I
believe it may have increased in later versions.
Is there any reason not to call xmlCreatePushParserCtxt() with a larger
chunk size ( the same as I use with xmlParseChunk() )?
I don't think so. However there does seem to be a sbtelty in this area.
You will note that you must call xmlParseChunk with a series of flags,
the last of which is a terminate indicator. I found that I had to call
xmlParseChunk at least twice to ensure proper behaviour. I didn't really
bottom out the cause of this behaviour, I was in a hurry, but I did note
that you didn't seem able to call xmlParseChunk just once with the
terminate flag set.
Given the above, I do something like this when parsing
/*
* The PAGE_READ_SIZE value is used to determine the size of the input
buffer
* used to parse XML files. As of libxml2, version 2.4.0, this must be less
* than 80 bytes or libxml2 will break under certain error conditions
related
* to parsing invalid XML files.
*/
#define PAGE_READ_SIZE 79
....
size = f_stat.st_size / 2 < PAGE_READ_SIZE ? f_stat.st_size / 2
: PAGE_READ_SIZE;
res = fread(chars, 1, size, prov->pxc_file);
if (res >= 4) {
if ((ctxt = xmlCreatePushParserCtxt(NULL, NULL,
chars, res, conf->pc_location)) == NULL) {
return (FAIL);
}
while ((res = fread(chars, 1, size, prov->pxc_file))
> 0) {
if (xmlParseChunk(ctxt, chars, res, 0) != 0) {
return (FAIL);
}
}
if (xmlParseChunk(ctxt, chars, 0, 1) != 0) {
return (FAIL);
}
prov->pxc_doc = ctxt->myDoc;
xmlFreeParserCtxt(ctxt);
}
This ensures that the read won't read the entire document when creating
the context and that xmlParseChunk will be called at least once without
the terminate flag set.
When the loop terminates, I call xmlParseChunk again with the terminate
flag set.
Oh, is there a correct procedure for aborting SAX processing? For example,
say I find some content or attribute and I want to stop any further parsing
(calling of my call-back functions) from that point.
I'm going to take a guess here; but I've never tried doing this and I'm
certainly not an expert.
Could you try using xmlSetFeature to disable SAX?
e.g.
int off=1;
xmlSetFeature(ctxt, "disable SAX", &off);
I notice that libxml seems to check the value of ctxt->disableSAX at
several points and it also seems to set it to 1 when errors are detected
during parsing, so it could be the right way to go.
Let me know if you get an answer to that as I'm interested.
Thanks,
Bill Moseley
mailto:moseley hank org
_______________________________________________
xml mailing list
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml
Gary
--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]