Re: [xml] Speed up patches for large attributes
- From: "Diego Santa Cruz" <Diego SantaCruz spinetix com>
- To: <veillard redhat com>
- Cc: xml gnome org
- Subject: Re: [xml] Speed up patches for large attributes
- Date: Mon, 20 Jul 2009 16:11:38 +0200
-----Original Message-----
From: Daniel Veillard [mailto:veillard redhat com]
Sent: 17 July 2009 18:03
To: Diego Santa Cruz
Cc: xml gnome org
Subject: Re: [xml] Speed up patches for large attributes
[snip]
I saw your patches, on bugzilla too, but I hadn't any time yet to
check them, I hope to do that in the near future. The pre-scanning
in xmlParseChunk() can be tricky, so this really need some serious
attention. As a first test make sure that "make check" passes okay
on a git checkout with the patch applied, that will help building
trust :-)
Thanks for taking a look and pointing to 'make check', as there is indeed a
problem in my proposed patch. If xmlParserInputBufferRead() returns 0 on an
XML_BUFFER_ALLOC_IMMUTABLE buffer then we could enter an infinite loop. This
is solved in the revised patch below where I break the loop over
xmlParserInputBufferRead() if it returns non-positive. With this revised
patch 'make check' passes without problem on both 2.7.3 and today's git.
Revised patch without infinite loop on xmlParserInputBufferRead()
===================================================================
--- xmlreader.c (revision 7981)
+++ xmlreader.c (working copy)
@@ -809,6 +809,8 @@
xmlBufferPtr inbuf;
int val, s;
xmlTextReaderState oldstate;
+ int csize = CHUNK_SIZE;
+ int search_tag_end;
if ((reader->input == NULL) || (reader->input->buffer == NULL))
return(-1);
@@ -816,9 +818,10 @@
oldstate = reader->state;
reader->state = XML_TEXTREADER_NONE;
inbuf = reader->input->buffer;
+ search_tag_end = 0;
while (reader->state == XML_TEXTREADER_NONE) {
- if (inbuf->use < reader->cur + CHUNK_SIZE) {
+ if (inbuf->use < reader->cur + csize) {
/*
* Refill the buffer unless we are at the end of the stream
*/
@@ -840,8 +843,16 @@
/* mark the end of the stream and process the remains */
reader->mode = XML_TEXTREADER_MODE_EOF;
break;
+ } if ((val > 0) && (search_tag_end)) {
+ const xmlChar *tmp, *end;
+ tmp = &inbuf->content[inbuf->use-val];
+ end = &inbuf->content[inbuf->use];
+ while (*tmp != '>' && tmp < end) tmp++;
+ csize = tmp - &inbuf->content[reader->cur] + 1;
+ search_tag_end = (tmp == end);
}
-
+ if (val > 0)
+ continue; /* ensure we have enough data */
} else
break;
}
@@ -849,11 +860,11 @@
* parse by block of CHUNK_SIZE bytes, various tests show that
* it's the best tradeoff at least on a 1.2GH Duron
*/
- if (inbuf->use >= reader->cur + CHUNK_SIZE) {
+ if (inbuf->use >= reader->cur + csize) {
val = xmlParseChunk(reader->ctxt,
(const char *) &inbuf->content[reader->cur],
- CHUNK_SIZE, 0);
- reader->cur += CHUNK_SIZE;
+ csize, 0);
+ reader->cur += csize;
if ((val != 0) || (reader->ctxt->wellFormed == 0))
return(-1);
} else {
@@ -866,6 +877,22 @@
return(-1);
break;
}
+
+ /*
+ * If nothing parsed on first pass try to find the end of a tag
+ * and pass at least that much to next chunk parse or trigger
+ * reading of input until we find a potential tag end
+ */
+ if (reader->state == XML_TEXTREADER_NONE) {
+ const xmlChar *tmp, *end;
+ tmp = &inbuf->content[reader->cur];
+ end = &inbuf->content[inbuf->use];
+ while (*tmp != '>' && tmp < end) tmp++;
+ csize = tmp - &inbuf->content[reader->cur] + 1;
+ if (csize < CHUNK_SIZE)
+ csize = CHUNK_SIZE;
+ search_tag_end = (tmp == end);
+ }
}
/*
--
Diego Santa Cruz, PhD
Technology Architect
_________________________________
SpinetiX S.A.
Rue des Terreaux 17
1003, Lausanne, Switzerland
T +41 21 341 15 50
F +41 21 311 19 56
diego santacruz spinetix com
http://www.spinetix.com
http://www.youtube.com/SpinetiXTeam
_________________________________
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]