Re: [xml] encoding problem



On Wed, Mar 28, 2001 at 12:52:08PM +0200, Tom . wrote:

Le 28/03/01 08:57:46, Bernard Valton a écrit :
I've got a xml document containing :

  <setvar value="&#38;!=x1A9F9A"/>

after parsing, I get a tree like this :

 element : name = setvar
   attr : name = value
   children : name = #38
                 content = NULL

Maybe it's correct, I don't know ??

I would like to get :

 element : name = setvar
   attr : name = value
   children : name = text
                 content = &#38;!=x1A9F9A

Is it a bug or the result of encoding ?
Is there a solution ?

ptittom:~$ echo '<setvar value="&#38;!=x1A9F9A"/>' |xmllint --debug -
DOCUMENT
version=1.0
URL=-
standalone=true
  ELEMENT setvar
    ATTRIBUTE value
      ENTITY_REF(#38)
      TEXT
        content=!=x1A9F9A
ptittom:~$ echo '<setvar value="&#38;!=x1A9F9A"/>' |xmllint --debug --noent
-
DOCUMENT
version=1.0
URL=-
standalone=true
  ELEMENT setvar
    ATTRIBUTE value
      TEXT
        content=&#38;!=x1A9F9A

Which version of libxml do you have ?

  Good question. But in both cases it's a problem it should show

DOCUMENT
version=1.0
standalone=true
  ELEMENT setvar
  ATTRIBUTE value
    TEXT
      content=&#38;!=x1A9F9A

  This is a problem with handling & chars in attribute values.
I consider this a side effect of the brokeness of the SAX API and
a bug in xmlStringGetNodeList().

Basically the sax API allows only a string value for an attribute
in the startElement callback. But

  <tst attr="xxxx &myentity; yyyy"/>

is perfectly possible with myentity being user defined and containing
an arbitrary string defined by the user. So there is 2 options:
   - either &myentity; reference is substitued before calling the
     SAX interface but since the DOM tree is built on top of it,
     libxml won't be able to save entities references in attribute
     values, just the replacement string :-(
   - or the &myentity; reference is kept as is and the DOM tree building
     handle the detection of entities references in attribute values
     to generate a proper list of TEXT and ENTITY_REF nodes. This is
     what libxml does.

 But the problem is that you can't pass "&!=x1A9F9A" as the attribute
value at the SAX interface anymore. because it won't be able to distinguish
& as inherited from an &amp; or &#38; and the & indicating the start of
an entity reference. So before calling SAX '&' from charrefs are reconverted
to &#38; . currently the SAX attribute() function calls
xmlStringGetNodeList() to do this decoding, but this function fails
to detect and handle character references correctly :-\ . But hopefully
I will fix it at some point or someone will send me a patch to fix it.

Daniel

-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]