[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] Speed up patches for large attributes



> -----Original Message-----
> From: Daniel Veillard [mailto:veillard redhat com]
> Sent: 17 July 2009 18:03
> To: Diego Santa Cruz
> Cc: xml gnome org
> Subject: Re: [xml] Speed up patches for large attributes
> 
[snip]
> 
>   I saw your patches, on bugzilla too, but I hadn't any time yet to
> check them, I hope to do that in the near future. The pre-scanning
> in xmlParseChunk() can be tricky, so this really need some serious
> attention. As a first test make sure that "make check" passes okay
> on a git checkout with the patch applied, that will help building
> trust :-)
> 

Thanks for taking a look and pointing to 'make check', as there is indeed a
problem in my proposed patch. If xmlParserInputBufferRead() returns 0 on an
XML_BUFFER_ALLOC_IMMUTABLE buffer then we could enter an infinite loop. This
is solved in the revised patch below where I break the loop over
xmlParserInputBufferRead() if it returns non-positive. With this revised
patch 'make check' passes without problem on both 2.7.3 and today's git.

Revised patch without infinite loop on xmlParserInputBufferRead()
===================================================================
--- xmlreader.c	(revision 7981)
+++ xmlreader.c	(working copy)
@@ -809,6 +809,8 @@
     xmlBufferPtr inbuf;
     int val, s;
     xmlTextReaderState oldstate;
+    int csize = CHUNK_SIZE;
+    int search_tag_end;
 
     if ((reader->input == NULL) || (reader->input->buffer == NULL))
 	return(-1);
@@ -816,9 +818,10 @@
     oldstate = reader->state;
     reader->state = XML_TEXTREADER_NONE;
     inbuf = reader->input->buffer;
+    search_tag_end = 0;
 
     while (reader->state == XML_TEXTREADER_NONE) {
-	if (inbuf->use < reader->cur + CHUNK_SIZE) {
+	if (inbuf->use < reader->cur + csize) {
 	    /*
 	     * Refill the buffer unless we are at the end of the stream
 	     */
@@ -840,8 +843,16 @@
 		    /* mark the end of the stream and process the remains */
 		    reader->mode = XML_TEXTREADER_MODE_EOF;
 		    break;
+		} if ((val > 0) && (search_tag_end)) {
+		    const xmlChar *tmp, *end;
+		    tmp = &inbuf->content[inbuf->use-val];
+		    end = &inbuf->content[inbuf->use];
+		    while (*tmp != '>' && tmp < end) tmp++;
+		    csize = tmp - &inbuf->content[reader->cur] + 1;
+		    search_tag_end = (tmp == end);
 		}
-
+		if (val > 0)
+		    continue; /* ensure we have enough data */
 	    } else 
 		break;
 	}
@@ -849,11 +860,11 @@
 	 * parse by block of CHUNK_SIZE bytes, various tests show that
 	 * it's the best tradeoff at least on a 1.2GH Duron
 	 */
-	if (inbuf->use >= reader->cur + CHUNK_SIZE) {
+	if (inbuf->use >= reader->cur + csize) {
 	    val = xmlParseChunk(reader->ctxt,
 		          (const char *) &inbuf->content[reader->cur],
-			  CHUNK_SIZE, 0);
-	    reader->cur += CHUNK_SIZE;
+			  csize, 0);
+	    reader->cur += csize;
 	    if ((val != 0) || (reader->ctxt->wellFormed == 0))
 		return(-1);
 	} else {
@@ -866,6 +877,22 @@
 		return(-1);
 	    break;
 	}
+
+	/*
+	 * If nothing parsed on first pass try to find the end of a tag
+	 * and pass at least that much to next chunk parse or trigger
+	 * reading of input until we find a potential tag end
+	 */
+	if (reader->state == XML_TEXTREADER_NONE) {
+	    const xmlChar *tmp, *end;
+	    tmp = &inbuf->content[reader->cur];
+	    end = &inbuf->content[inbuf->use];
+	    while (*tmp != '>' && tmp < end) tmp++;
+	    csize = tmp - &inbuf->content[reader->cur] + 1;
+	    if (csize < CHUNK_SIZE)
+		csize = CHUNK_SIZE;
+	    search_tag_end = (tmp == end);
+	}
     }
 
     /*


--
Diego Santa Cruz, PhD
Technology Architect
_________________________________
SpinetiX S.A.
Rue des Terreaux 17
1003, Lausanne, Switzerland
T +41 21 341 15 50
F +41 21 311 19 56
diego santacruz spinetix com
http://www.spinetix.com
http://www.youtube.com/SpinetiXTeam
_________________________________



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]