Re: [xml] encoding problem
- From: Daniel Veillard <veillard redhat com>
- To: "Tom ." <ptittom free fr>
- Cc: Bernard Valton <bvalton yaccom com>, xml gnome org
- Subject: Re: [xml] encoding problem
- Date: Wed, 28 Mar 2001 06:25:30 -0500
On Wed, Mar 28, 2001 at 12:52:08PM +0200, Tom . wrote:
Le 28/03/01 08:57:46, Bernard Valton a écrit :
I've got a xml document containing :
<setvar value="&!=x1A9F9A"/>
after parsing, I get a tree like this :
element : name = setvar
attr : name = value
children : name = #38
content = NULL
Maybe it's correct, I don't know ??
I would like to get :
element : name = setvar
attr : name = value
children : name = text
content = &!=x1A9F9A
Is it a bug or the result of encoding ?
Is there a solution ?
ptittom:~$ echo '<setvar value="&!=x1A9F9A"/>' |xmllint --debug -
DOCUMENT
version=1.0
URL=-
standalone=true
ELEMENT setvar
ATTRIBUTE value
ENTITY_REF(#38)
TEXT
content=!=x1A9F9A
ptittom:~$ echo '<setvar value="&!=x1A9F9A"/>' |xmllint --debug --noent
-
DOCUMENT
version=1.0
URL=-
standalone=true
ELEMENT setvar
ATTRIBUTE value
TEXT
content=&!=x1A9F9A
Which version of libxml do you have ?
Good question. But in both cases it's a problem it should show
DOCUMENT
version=1.0
standalone=true
ELEMENT setvar
ATTRIBUTE value
TEXT
content=&!=x1A9F9A
This is a problem with handling & chars in attribute values.
I consider this a side effect of the brokeness of the SAX API and
a bug in xmlStringGetNodeList().
Basically the sax API allows only a string value for an attribute
in the startElement callback. But
<tst attr="xxxx &myentity; yyyy"/>
is perfectly possible with myentity being user defined and containing
an arbitrary string defined by the user. So there is 2 options:
- either &myentity; reference is substitued before calling the
SAX interface but since the DOM tree is built on top of it,
libxml won't be able to save entities references in attribute
values, just the replacement string :-(
- or the &myentity; reference is kept as is and the DOM tree building
handle the detection of entities references in attribute values
to generate a proper list of TEXT and ENTITY_REF nodes. This is
what libxml does.
But the problem is that you can't pass "&!=x1A9F9A" as the attribute
value at the SAX interface anymore. because it won't be able to distinguish
& as inherited from an & or & and the & indicating the start of
an entity reference. So before calling SAX '&' from charrefs are reconverted
to & . currently the SAX attribute() function calls
xmlStringGetNodeList() to do this decoding, but this function fails
to detect and handle character references correctly :-\ . But hopefully
I will fix it at some point or someone will send me a patch to fix it.
Daniel
--
Daniel Veillard | Red Hat Network http://redhat.com/products/network/
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]