[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] issue with libxml2, python and entities
- From: Daniel Veillard <veillard redhat com>
- To: Mike Kneller <ukchill mac com>
- Cc: xml gnome org
- Subject: Re: [xml] issue with libxml2, python and entities
- Date: Sat, 27 Jan 2007 06:00:18 -0500
On Sat, Jan 27, 2007 at 12:34:21AM +0000, Mike Kneller wrote:
> I am not sure if I have located a bug or not....
>
> Using Python (2.4) and libxml2.2.6.22
>
> When I load an document containing an entity, if I attempt to read
> the value of a node containing an entity, I get the text content and
> the entity disappears.
> In the following example, when looking at root.content I would expect
> to see '©2007', instead all I get is '2007'.
>
> I was advised on the #XML IRC channel to construct a simple test
> case, so here it is:
>
On IRC you said the entity was defined in the internal subset, it's not
> File 1: test.xml
>
> <?xml version="1.0"?>
> <!DOCTYPE content [
> <!ENTITY % HTMLlat1 PUBLIC
> "-//W3C//ENTITIES Latin 1 for XHTML//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
> %HTMLlat1;
> ]>
> <content>
> <p>©2007</p>
> </content>
libxml2 doesn't load externl subset by default,
> File 2: testcase.py
>
> import libxml2
> sourcedoc = libxml2.parseFile( 'test.xml' )
> root = sourcedoc.getRootElement()
> print root.serialize()
> print root.content
So your content element has 2 children an entity reference
to copy whose content is unknown and the text node with "2007"
>
> Reading the source for libxml2.py, I find the following:
> def getContent(self):
> """Read the value of a node, this can be either the text
> carried directly by this node if it's a TEXT node or the
> aggregate string of the values carried by this node
> child's (TEXT and ENTITY_REF). Entity references are
> substituted. """
> ret = libxml2mod.xmlNodeGetContent(self._o)
> return ret
>
>
> Which in my (admittedly limited) understanding I would have thought
> would return the translated entity as well as the text when I examine
> root.content.
>
> Is this a bug, or am I doing something wrong?
Not askling to load the external subset, use readFile and pass the
XML_PARSE_DTDLOAD option. It should work even with the ancient version
2.6.22 , but please firtst upgrade first in case of problem.
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]