Hi, I'm trying to use the SAX parser from libxml++ to read a simple XML file generated from a third-party program. At the head of the file is an XML declaration specifying the charset encoding: <?xml version="1.0" encoding="ISO-8859-1"?> A short distance into the file is the following text: <sub-title lang="en">Highlights of the final of the Grand Slam of Darts, played over the best of 35 legs. The winner will be crowned the inaugural champion and receive a cheque for £80,000. [S]</sub-title> (Just in case that's got mangled in transit, that's the entity/character literal 0xa3, for the UK Pound symbol in ISO-8859-1). When I pass this to libxml++, I get a Glib::Error thrown, complaining about "Invalid byte sequence in conversion input". It seems that libxml++ is reading the &#A3; and converting it to a byte, then trying to interpret that as UTF-8, which it isn't. I've tried converting the input chunk before I pass it to the parser (using Glib::convert), but obviously that isn't working, as it's processing the entity as its component characters, rather than converting it to a byte sequence. How do I handle this input correctly with libxml++? Do I have to preprocess each chunk manually to convert the character entities before passing it to the parser, or is there some way of persuading the SaxParser to do it? Thanks, Hugo. -- === Hugo Mills: hugo carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- "What are we going to do tonight?" "The same thing we do --- every night, Pinky. Try to take over the world!"
Attachment:
signature.asc
Description: Digital signature