Re: [xml] Problem with character references in range � through  inclusive



Hi David, All,

  2.  My basic problem is that I need to be able to
      encode any arbitrary octet sequence in the
      XML element.  All of the characters except those
      in the range � through  inclusive.
      I experimented with doing them in hex both with
      and without leading zeroes, and in decimal, as
      well.  In every case every octet in the range
      � through  inclusive is simply dropped.

So, the long and short of it is -- what is the correct
way to encode the octets in that range?

This is a common problem and googling around for
'xml "binary content"' will find a lot of discussions of
this. 

Essentially the answer is "XML is not designed for this sort of 
application".

The most common workaround is putting the binary data base64 
encoded into the XML file. Also higher level protocols are used to 
let the binary data travel alongside the XML file and include it
in the XML by reference.

If you consider your data to be of more textual nature, but
containing the occasional control chaarcter, remapping the
control character to the Unicode PUA (or to the
Control Pictures block), may be an option.

Regards,
Peter Jacobi



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]