[libxml2] Speed up HTML fuzzer

From: Nick Wellnhofer <nwellnhof src gnome org>
To: commits-list gnome org
Cc:
Subject: [libxml2] Speed up HTML fuzzer
Date: Sun, 7 Feb 2021 13:40:39 +0000 (UTC)

commit ec808a44156d2464ee0e604979bde794213f61ef
Author: Nick Wellnhofer <wellnhofer aevum de>
Date:   Sun Feb 7 13:57:49 2021 +0100

    Speed up HTML fuzzer
    
    htmlDocDumpMemory uses the "HTML" encoding if no other encoding was
    specified in the source HTML. This encoding can be extremely slow
    because of an inefficiency in htmlEntityValueLookup. Stop encoding
    the output for now.

 fuzz/html.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)
---
diff --git a/fuzz/html.c b/fuzz/html.c
index d212c1f0..449a9d49 100644
--- a/fuzz/html.c
+++ b/fuzz/html.c
@@ -22,7 +22,7 @@ LLVMFuzzerTestOneInput(const char *data, size_t size) {
     static const size_t maxChunkSize = 128;
     htmlDocPtr doc;
     htmlParserCtxtPtr ctxt;
-    xmlChar *out;
+    xmlOutputBufferPtr out;
     const char *docBuffer;
     size_t docSize, consumed, chunkSize;
     int opts, outSize;
@@ -39,9 +39,16 @@ LLVMFuzzerTestOneInput(const char *data, size_t size) {
     /* Pull parser */
 
     doc = htmlReadMemory(docBuffer, docSize, NULL, NULL, opts);
-    /* Also test the serializer. */
-    htmlDocDumpMemory(doc, &out, &outSize);
-    xmlFree(out);
+
+    /*
+     * Also test the serializer. Call htmlDocContentDumpOutput with our
+     * own buffer to avoid encoding the output. The HTML encoding is
+     * excruciatingly slow (see htmlEntityValueLookup).
+     */
+    out = xmlAllocOutputBuffer(NULL);
+    htmlDocContentDumpOutput(out, doc, NULL);
+    xmlOutputBufferClose(out);
+
     xmlFreeDoc(doc);
 
     /* Push parser */

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]