[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] Problem with character references in range � through  inclusive



Thank you for the quick reply.

I have two questions about your reply, vi
  1.  The URL you supplied (http://www.w3.org/REC-xml)
      returns a 404 error.

  2.  My basic problem is that I need to be able to
      encode any arbitrary octet sequence in the
      XML element.  All of the characters except those
      in the range � through  inclusive.
      I experimented with doing them in hex both with
      and without leading zeroes, and in decimal, as
      well.  In every case every octet in the range
      � through  inclusive is simply dropped.

So, the long and short of it is -- what is the correct
way to encode the octets in that range?

Again, thank you for any light you can shed on this.

David Hoos
 
----- Original Message ----- 
From: "Daniel Veillard" <veillard redhat com>
To: "David C. Hoos" <david c hoos sr ada95 com>
Cc: <xml gnome org>
Sent: March 25, 2004 5:51 PM
Subject: Re: [xml] Problem with character references in range &#x00; through &#x1f; inclusive


> On Thu, Mar 25, 2004 at 05:39:33PM -0600, David C. Hoos wrote:
> > I am having difficulty with the function xmlStringGetNodeList() (called from
> > xmlNodeSetContent() ).  When I submit a content string like the following:
> > 
> > &#x0017;&#x0003;&#x0001;&#x0000;0&#x00B8;g&#x00D8;+&#x00DB; /
> > &#xFB01;&#x00E6;&#x2018;&#x0008;&#x0002;qF&#x00AF;&#x00CD;
> > &#x00EC;lw&#x02C6;&#x222B;&#x00FB;&#x00C9;&#x201C;5&#x0014;
> > &#x02C6;&#x2014;.=&#x00C5;&#x00EF;&#x2019;E2&#x00C6;x&#x000D;
> > &#x00D3;\KTn&#x2014;u&#x00A9;&#x02DC;T
> > 
> > What I get in the resulting xml is the following:
> > 
> > 0&#xB8;g&#xD8;+&#xDB; /
> > &#xFB01;&#xE6;&#x2018;qF&#xAF;&#xCD;
> > &#xEC;lw&#x2C6;&#x222B;&#xFB;&#xC9;&#x201C;5
> > &#x2C6;&#x2014;.=&#xC5;&#xEF;&#x2019;E2&#xC6;x&#13;
> > &#xD3;\KTn&#x2014;u&#xA9;&#x2DC;T
> > 
> > This appears to me to be a bug -- or am I missing something?
> > 
> > Thanks for any light you can shed on this.
> 
>   Hum ... &#x0003; is not in the allowed character range of XML
> (see production 4 of the spec at http://www.w3.org/REC-xml IIRC)
> and using xmlNodeSetContent() with such a content is an error,
> but libxml2 doesn't do the checking at that level.
>  Make 100% sure that when you manipulate XML document content,
> the strings are valid UTF8 encoded XML content, otherwise you 
> will get errors either at serialization time or when reloading
> the output.
> 
> Daniel
> 
> -- 
> Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
> veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
> http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
> 
> 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]