[xml] issue with libxml2, python and entities

From: Mike Kneller <ukchill mac com>
To: xml gnome org
Subject: [xml] issue with libxml2, python and entities
Date: Sat, 27 Jan 2007 00:34:21 +0000

I am not sure if I have located a bug or not....

Using Python (2.4) and libxml2.2.6.22

When I load an document containing an entity, if I attempt to readthe value of a node containing an entity, I get the text content andthe entity disappears.In the following example, when looking at root.content I would expectto see '©2007', instead all I get is '2007'.

I was advised on the #XML IRC channel to construct a simple testcase, so here it is:



File 1: test.xml

<?xml version="1.0"?>
<!DOCTYPE content [
<!ENTITY % HTMLlat1 PUBLIC
   "-//W3C//ENTITIES Latin 1 for XHTML//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent";>
%HTMLlat1;
]>
<content>
        <p>&copy;2007</p>
</content>


File 2: testcase.py

import libxml2
sourcedoc = libxml2.parseFile( 'test.xml' )
root = sourcedoc.getRootElement()
print root.serialize()
print root.content


Reading the source for libxml2.py, I find the following:
    def getContent(self):
        """Read the value of a node, this can be either the text
           carried directly by this node if it's a TEXT node or the
           aggregate string of the values carried by this node
           child's (TEXT and ENTITY_REF). Entity references are
           substituted. """
        ret = libxml2mod.xmlNodeGetContent(self._o)
        return ret

Which in my (admittedly limited) understanding I would have thoughtwould return the translated entity as well as the text when I examineroot.content.


Is this a bug, or am I doing something wrong?

Cheers
Mike
ukchill mac com
http://www.mikekneller.com

Follow-Ups:
- Re: [xml] issue with libxml2, python and entities
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]