Re: [xml] XML Entities encoding question



Hi

On 5 August 2013 18:50, Fred <fred fredex gmail com> wrote:
I have an app that emits XML as 8859-1 (or other encoding as needed), and the XML is sent to an Oracle database where the XML is unpacked and the contents used to update an existing schema.

I apparently fail to understand something about how char encodings work at the intersection of XML and Oracle.

If I send:

<?xml version="1.0" encoding="WINDOWS-1252"?>
<MSG>
...
<LAST_NAME>BOLA<C3><C9>OS</LAST_NAME>
...
</MSG>

the two accented characters are each transformed into 0xBF. (with exactly the same result if it's 8859-1 instead of WINDOWS-1252.)

however, if I send:

<LAST_NAME>BOLA&#x00c3; &#x00c9;OS</LAST_NAME>

I get the desired result.

While I'm working on figuring out what I'm doing wrong regarding Oracle, is there some way I can force libxml2 to emit the second form rather than the first?

the tree is output using:
xmlDocDumpFormatMemoryEnc (doc, xmlbufptr, &xmlbufptr_size, "WINDOWS-1252", 1);

What happens if you use ascii instead of WINDOWS-1252?  Windows-1252 and iso-8859-1 can include those characters as is, whereas if the document is encoded as ASCII, they will need to be escaped, so in theory libxml will escape them.  I haven't tried, though.
 
thanks!

Fred

--
Michael Wood <esiotrot gmail com>

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]