[xml] why does doc.serialize() not escape & --> & ?



this example is done using libxml2 python bindings

import libxml2
libxml2.debugMemory(1)
0
doc = libxml2.newDoc("1.0")
html_content = "<html>&uuml;</html>"
html_content = "<html>&gt;&uuml;</html>"
root = doc.newChild(None, "doc", html_content)
print doc.serialize(None,1)
<?xml version="1.0"?>
<doc>&lt;html&gt;&gt;&uuml;&lt;/html&gt;</doc>
doc.freeDoc()

why < and > are escaped but & is not

this has two bad results

1) the serialization may not be valid xml

doc2 = libxml2.parseDoc(doc.serialize(None,1))
Entity: line 2: error: Entity 'uuml' not defined
<doc>&lt;html&gt;&gt;&uuml;&lt;/html&gt;</doc>

2) you have no way to tell what the original html_content looked like

-------------
Hannu




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]