Re: [xml] Problem with character references in range � through  inclusive



Thank you for the quick reply.

I have two questions about your reply, vi
  1.  The URL you supplied (http://www.w3.org/REC-xml)
      returns a 404 error.

  2.  My basic problem is that I need to be able to
      encode any arbitrary octet sequence in the
      XML element.  All of the characters except those
      in the range � through  inclusive.
      I experimented with doing them in hex both with
      and without leading zeroes, and in decimal, as
      well.  In every case every octet in the range
      � through  inclusive is simply dropped.

So, the long and short of it is -- what is the correct
way to encode the octets in that range?

Again, thank you for any light you can shed on this.

David Hoos
 
----- Original Message ----- 
From: "Daniel Veillard" <veillard redhat com>
To: "David C. Hoos" <david c hoos sr ada95 com>
Cc: <xml gnome org>
Sent: March 25, 2004 5:51 PM
Subject: Re: [xml] Problem with character references in range &#x00; through &#x1f; inclusive


On Thu, Mar 25, 2004 at 05:39:33PM -0600, David C. Hoos wrote:
I am having difficulty with the function xmlStringGetNodeList() (called from
xmlNodeSetContent() ).  When I submit a content string like the following:

&#x0017;&#x0003;&#x0001;&#x0000;0&#x00B8;g&#x00D8;+&#x00DB; /
&#xFB01;&#x00E6;&#x2018;&#x0008;&#x0002;qF&#x00AF;&#x00CD;
&#x00EC;lw&#x02C6;&#x222B;&#x00FB;&#x00C9;&#x201C;5&#x0014;
&#x02C6;&#x2014;.=&#x00C5;&#x00EF;&#x2019;E2&#x00C6;x&#x000D;
&#x00D3;\KTn&#x2014;u&#x00A9;&#x02DC;T

What I get in the resulting xml is the following:

0&#xB8;g&#xD8;+&#xDB; /
&#xFB01;&#xE6;&#x2018;qF&#xAF;&#xCD;
&#xEC;lw&#x2C6;&#x222B;&#xFB;&#xC9;&#x201C;5
&#x2C6;&#x2014;.=&#xC5;&#xEF;&#x2019;E2&#xC6;x&#13;
&#xD3;\KTn&#x2014;u&#xA9;&#x2DC;T

This appears to me to be a bug -- or am I missing something?

Thanks for any light you can shed on this.

  Hum ... &#x0003; is not in the allowed character range of XML
(see production 4 of the spec at http://www.w3.org/REC-xml IIRC)
and using xmlNodeSetContent() with such a content is an error,
but libxml2 doesn't do the checking at that level.
 Make 100% sure that when you manipulate XML document content,
the strings are valid UTF8 encoded XML content, otherwise you 
will get errors either at serialization time or when reloading
the output.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]