[xml] Non-ASCII characters



I am trying to save internal data from an application to xml, then later parse the xml to restore the internal data. Some strings come from external sources and I noticed that sometimes there are characters above 0x7F. In one instance, a string had two characters, first 'x' then 0xB2, which is "superscript two" in other words "x squared."

When I save to xml I get:

<string>x&#178</string>

When I use SAX to retrieve the string I get "x" then I get 0xC2 0xB2. I know very little about Unicode, but I found that "superscript two" is 00B2 in the Unicode Latin-1 supplement. What is C2B2?

I fixed the problem by skipping the 0xC2 and appending the the character that follows it to the string, but it feels like a bad hack. Is there a way to specify that I am working with 1-byte characters in the full range 00-FF? Is there another way to solve this problem?

Thanks,

Dan Timis




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]