Re: [xml] Useless function calls in xmlSetProp()?



Daniel Veillard wrote:
On Mon, Jan 28, 2008 at 03:26:02PM +0100, Julien Charbon wrote:
Hum, sure, UTF-8 validation shall not be removed. Anyway to evaluate this extra complexity, I made the simple program [see below] that do 1000 iterations of xmlSetProp(node, name, value) and calculate the sum of all these calls with various 'value' parameter:

  okay, this makes a difference, agreed, is that really perceptible
on a real application run ? I'm unsure ...

True. On our application, performance improvement is between 0.5 - 0.7%. More than nothing, but certainly not a super fast revolution...

Patch applied to current libxml2 trunk:
But I tend to like the patch for a few reasons:
     - it cleans things up and show the actual process
     - it enforces the UTF-8 check in a clear manner
     - it doesn't change apparently the actual behaviour of the API
== tree.c.patch ==
  The patch looks fine to me, if you can provide the final version as an
email attachment, I will try to apply it,

Seems fine and clear. Attached to this email the "final" patch against current trunk.

Note:

Changing doc->encoding to "ISO-8859-1" in case of not valid UTF-8 value is coming from previous xmlEncodeEntitiesReentrant() call.

 Thanks.

--
Julien
Index: include/libxml/xmlerror.h
===================================================================
--- include/libxml/xmlerror.h   (revision 3690)
+++ include/libxml/xmlerror.h   (working copy)
@@ -398,6 +398,7 @@
     XML_TREE_INVALID_HEX = 1300,
     XML_TREE_INVALID_DEC, /* 1301 */
     XML_TREE_UNTERMINATED_ENTITY, /* 1302 */
+    XML_TREE_NOT_UTF8, /* 1303 */
     XML_SAVE_NOT_UTF8 = 1400,
     XML_SAVE_CHAR_INVALID, /* 1401 */
     XML_SAVE_NO_DOCTYPE, /* 1402 */
Index: tree.c
===================================================================
--- tree.c      (revision 3690)
+++ tree.c      (working copy)
@@ -92,6 +92,9 @@
        case XML_TREE_UNTERMINATED_ENTITY:
            msg = "unterminated entity reference %15s\n";
            break;
+       case XML_TREE_NOT_UTF8:
+           msg = "string is not in UTF-8\n";
+           break;
        default:
            msg = "unexpected error number\n";
     }
@@ -1814,11 +1817,15 @@
         cur->name = name;
 
     if (value != NULL) {
-        xmlChar *buffer;
         xmlNodePtr tmp;
 
-        buffer = xmlEncodeEntitiesReentrant(doc, value);
-        cur->children = xmlStringGetNodeList(doc, buffer);
+        if(!xmlCheckUTF8(value)) {
+            xmlTreeErr(XML_TREE_NOT_UTF8, (xmlNodePtr) doc,
+                       NULL);
+            if (doc != NULL)
+                doc->encoding = xmlStrdup(BAD_CAST "ISO-8859-1");
+        }
+        cur->children = xmlNewDocText(doc, value);
         cur->last = NULL;
         tmp = cur->children;
         while (tmp != NULL) {
@@ -1827,7 +1834,6 @@
                 cur->last = tmp;
             tmp = tmp->next;
         }
-        xmlFree(buffer);
     }
 
     /*
@@ -6466,11 +6472,15 @@
        prop->last = NULL;
        prop->ns = ns;
        if (value != NULL) {
-           xmlChar *buffer;
            xmlNodePtr tmp;
            
-           buffer = xmlEncodeEntitiesReentrant(node->doc, value);
-           prop->children = xmlStringGetNodeList(node->doc, buffer);
+           if(!xmlCheckUTF8(value)) {
+               xmlTreeErr(XML_TREE_NOT_UTF8, (xmlNodePtr) node->doc,
+                          NULL);
+                if (node->doc != NULL)
+                    node->doc->encoding = xmlStrdup(BAD_CAST "ISO-8859-1");
+           }
+           prop->children = xmlNewDocText(node->doc, value);
            prop->last = NULL;
            tmp = prop->children;
            while (tmp != NULL) {
@@ -6479,7 +6489,6 @@
                    prop->last = tmp;
                tmp = tmp->next;
            }
-           xmlFree(buffer);
        }
        if (prop->atype == XML_ATTRIBUTE_ID)
            xmlAddID(NULL, node->doc, value, prop);


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]