Re: [xml] Speed up patches for large attributes



-----Original Message-----
From: Daniel Veillard [mailto:veillard redhat com]
Sent: 17 July 2009 18:03
To: Diego Santa Cruz
Cc: xml gnome org
Subject: Re: [xml] Speed up patches for large attributes

[snip]

  I saw your patches, on bugzilla too, but I hadn't any time yet to
check them, I hope to do that in the near future. The pre-scanning
in xmlParseChunk() can be tricky, so this really need some serious
attention. As a first test make sure that "make check" passes okay
on a git checkout with the patch applied, that will help building
trust :-)


Thanks for taking a look and pointing to 'make check', as there is indeed a
problem in my proposed patch. If xmlParserInputBufferRead() returns 0 on an
XML_BUFFER_ALLOC_IMMUTABLE buffer then we could enter an infinite loop. This
is solved in the revised patch below where I break the loop over
xmlParserInputBufferRead() if it returns non-positive. With this revised
patch 'make check' passes without problem on both 2.7.3 and today's git.

Revised patch without infinite loop on xmlParserInputBufferRead()
===================================================================
--- xmlreader.c (revision 7981)
+++ xmlreader.c (working copy)
@@ -809,6 +809,8 @@
     xmlBufferPtr inbuf;
     int val, s;
     xmlTextReaderState oldstate;
+    int csize = CHUNK_SIZE;
+    int search_tag_end;
 
     if ((reader->input == NULL) || (reader->input->buffer == NULL))
        return(-1);
@@ -816,9 +818,10 @@
     oldstate = reader->state;
     reader->state = XML_TEXTREADER_NONE;
     inbuf = reader->input->buffer;
+    search_tag_end = 0;
 
     while (reader->state == XML_TEXTREADER_NONE) {
-       if (inbuf->use < reader->cur + CHUNK_SIZE) {
+       if (inbuf->use < reader->cur + csize) {
            /*
             * Refill the buffer unless we are at the end of the stream
             */
@@ -840,8 +843,16 @@
                    /* mark the end of the stream and process the remains */
                    reader->mode = XML_TEXTREADER_MODE_EOF;
                    break;
+               } if ((val > 0) && (search_tag_end)) {
+                   const xmlChar *tmp, *end;
+                   tmp = &inbuf->content[inbuf->use-val];
+                   end = &inbuf->content[inbuf->use];
+                   while (*tmp != '>' && tmp < end) tmp++;
+                   csize = tmp - &inbuf->content[reader->cur] + 1;
+                   search_tag_end = (tmp == end);
                }
-
+               if (val > 0)
+                   continue; /* ensure we have enough data */
            } else 
                break;
        }
@@ -849,11 +860,11 @@
         * parse by block of CHUNK_SIZE bytes, various tests show that
         * it's the best tradeoff at least on a 1.2GH Duron
         */
-       if (inbuf->use >= reader->cur + CHUNK_SIZE) {
+       if (inbuf->use >= reader->cur + csize) {
            val = xmlParseChunk(reader->ctxt,
                          (const char *) &inbuf->content[reader->cur],
-                         CHUNK_SIZE, 0);
-           reader->cur += CHUNK_SIZE;
+                         csize, 0);
+           reader->cur += csize;
            if ((val != 0) || (reader->ctxt->wellFormed == 0))
                return(-1);
        } else {
@@ -866,6 +877,22 @@
                return(-1);
            break;
        }
+
+       /*
+        * If nothing parsed on first pass try to find the end of a tag
+        * and pass at least that much to next chunk parse or trigger
+        * reading of input until we find a potential tag end
+        */
+       if (reader->state == XML_TEXTREADER_NONE) {
+           const xmlChar *tmp, *end;
+           tmp = &inbuf->content[reader->cur];
+           end = &inbuf->content[inbuf->use];
+           while (*tmp != '>' && tmp < end) tmp++;
+           csize = tmp - &inbuf->content[reader->cur] + 1;
+           if (csize < CHUNK_SIZE)
+               csize = CHUNK_SIZE;
+           search_tag_end = (tmp == end);
+       }
     }
 
     /*


--
Diego Santa Cruz, PhD
Technology Architect
_________________________________
SpinetiX S.A.
Rue des Terreaux 17
1003, Lausanne, Switzerland
T +41 21 341 15 50
F +41 21 311 19 56
diego santacruz spinetix com
http://www.spinetix.com
http://www.youtube.com/SpinetiXTeam
_________________________________




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]