[libxml2] Use a hybrid allocation scheme in xmlNodeSetContent



commit 7d0d2a50ac94850d5c5a6360fa1c931a89e75ddd
Author: Conrad Irwin <conrad irwin gmail com>
Date:   Mon May 14 14:18:58 2012 +0800

    Use a hybrid allocation scheme in xmlNodeSetContent
    
    On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard redhat com> wrote:
    > ÂHi Conrad,
    >
    > that's interesting ! I was initially afraid of a sudden explosion of
    > memory allocations for building a tree since by default buffers tend to
    > "waste" memory by using doubling allocations, but that's not the case.
    > Âxmllint --noout doc/libxml2-api.xml
    > when compiled with memory debug produce
    >
    > paphio:~/XML -> cat .memdump
    > Â Â ÂMEMORY ALLOCATED : 0, MAX was 12756699
    >
    > and without your patch 12755657, i.e. the increase is minimal.
    
    Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This
    is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the
    same amount of space as XML_ALLOC_DOUBLEIT on this example (see below).
    
    So it turns out that the default realloc() on my system actually handles this case really
    well â and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the
    underlying realloc() after all (sorry for misleading you). If you replace the realloc()
    with a bad one (like valgrind's), then the performance degrades severely.
    
    This patch implements a HYBRID allocator which has the behaviour you describe (it's
    like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT
    after 4kb) â that gets the memory back down to 12755657, with no noticeable impact on the
    performance of the synthetic pathological example under valgrind.
    
    In summary:
    
         max_memory on ./xmllint --noout doc/libxml2-api.xml,
         valgrind time on https://gist.github.com/2656940
    
                max_memory    valgrind time
    before   |  12755657    | 29:18.2
    EXACT    |  12756699    |  2:58.6 <-- this is the state after the first patch.
    DOUBLEIT |  12756727    |  0:02.7
    HYBRID   |  12755754    |  0:02.7 <-- this is the state with both patches.
    
    >
    > There is also the cost of creating the buffers all the time.
    > I need to read the code and check but I may be interested in an hybrid
    > approach where we switch to buffer only when the text node starts to
    > become too big (4k would remove nearly all usuall types of "document"
    > usage, i.e. not blocks of data)
    
    I tried to avoid too much buffer creation by introducing the xmlBufferDetach function,
    which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack"
    in API terms though I thought the gains would be worth it.
    
    Conrad
    
    ------8<------
    
    To keep memory usage tight in normal conditions it's desirable to only
    allocate as much space as is needed. Unfortunately this can lead to
    problems when constructing a long string out of small chunks, because
    every chunk you add will need to resize the buffer.
    
    To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big)
    from using exact allocations to doubling buffer size every time it is
    full. This limits the number of buffer resizes to O(log n) (down from
    O(n)), and thus greatly increases the performance of constructing very
    large strings in this manner.

 include/libxml/tree.h |    3 ++-
 tree.c                |   24 +++++++++++++++++++++++-
 2 files changed, 25 insertions(+), 2 deletions(-)
---
diff --git a/include/libxml/tree.h b/include/libxml/tree.h
index b522f61..2196f8d 100644
--- a/include/libxml/tree.h
+++ b/include/libxml/tree.h
@@ -74,7 +74,8 @@ typedef enum {
     XML_BUFFER_ALLOC_DOUBLEIT,	/* double each time one need to grow */
     XML_BUFFER_ALLOC_EXACT,	/* grow only to the minimal size */
     XML_BUFFER_ALLOC_IMMUTABLE, /* immutable buffer */
-    XML_BUFFER_ALLOC_IO		/* special allocation scheme used for I/O */
+    XML_BUFFER_ALLOC_IO,	/* special allocation scheme used for I/O */
+    XML_BUFFER_ALLOC_HYBRID	/* exact up to a threshold, and doubleit thereafter */
 } xmlBufferAllocationScheme;
 
 /**
diff --git a/tree.c b/tree.c
index d8fafc0..47b906c 100644
--- a/tree.c
+++ b/tree.c
@@ -682,7 +682,8 @@ try_complex:
 void
 xmlSetBufferAllocationScheme(xmlBufferAllocationScheme scheme) {
     if ((scheme == XML_BUFFER_ALLOC_EXACT) ||
-        (scheme == XML_BUFFER_ALLOC_DOUBLEIT))
+        (scheme == XML_BUFFER_ALLOC_DOUBLEIT) ||
+        (scheme == XML_BUFFER_ALLOC_HYBRID))
 	xmlBufferAllocScheme = scheme;
 }
 
@@ -693,6 +694,9 @@ xmlSetBufferAllocationScheme(xmlBufferAllocationScheme scheme) {
  * XML_BUFFER_ALLOC_EXACT - use exact sizes, keeps memory usage down
  * XML_BUFFER_ALLOC_DOUBLEIT - double buffer when extra needed,
  *                             improves performance
+ * XML_BUFFER_ALLOC_HYBRID - use exact sizes on small strings to keep memory usage tight
+ *                            in normal usage, and doubleit on large strings to avoid
+ *                            pathological performance.
  *
  * Returns the current allocation scheme
  */
@@ -1267,6 +1271,7 @@ xmlStringLenGetNodeList(xmlDocPtr doc, const xmlChar *value, int len) {
 
     buffer = xmlBufferCreateSize(0);
     if (buffer == NULL) return(NULL);
+    xmlBufferSetAllocationScheme(buffer, XML_BUFFER_ALLOC_HYBRID);
 
     q = cur;
     while ((cur < end) && (*cur != 0)) {
@@ -1474,6 +1479,7 @@ xmlStringGetNodeList(xmlDocPtr doc, const xmlChar *value) {
 
     buffer = xmlBufferCreateSize(0);
     if (buffer == NULL) return(NULL);
+    xmlBufferSetAllocationScheme(buffer, XML_BUFFER_ALLOC_HYBRID);
 
     q = cur;
     while (*cur != 0) {
@@ -7000,6 +7006,7 @@ xmlBufferSetAllocationScheme(xmlBufferPtr buf,
         (buf->alloc == XML_BUFFER_ALLOC_IO)) return;
     if ((scheme == XML_BUFFER_ALLOC_DOUBLEIT) ||
         (scheme == XML_BUFFER_ALLOC_EXACT) ||
+        (scheme == XML_BUFFER_ALLOC_HYBRID) ||
         (scheme == XML_BUFFER_ALLOC_IMMUTABLE))
 	buf->alloc = scheme;
 }
@@ -7267,6 +7274,21 @@ xmlBufferResize(xmlBufferPtr buf, unsigned int size)
 	case XML_BUFFER_ALLOC_EXACT:
 	    newSize = size+10;
 	    break;
+        case XML_BUFFER_ALLOC_HYBRID:
+            if (buf->use < BASE_BUFFER_SIZE)
+                newSize = size;
+            else {
+                newSize = buf->size * 2;
+                while (size > newSize) {
+                    if (newSize > UINT_MAX / 2) {
+                        xmlTreeErrMemory("growing buffer");
+                        return 0;
+                    }
+                    newSize *= 2;
+                }
+            }
+            break;
+
 	default:
 	    newSize = size+10;
 	    break;



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]