Re: [libxml++] Charset conversion error -- ignoring encoding declaration?
- From: Murray Cumming <murrayc murrayc com>
- To: Hugo Mills <hugo-xmlpp carfax org uk>
- Cc: libxmlplusplus-general lists sourceforge net
- Subject: Re: [libxml++] Charset conversion error -- ignoring encoding declaration?
- Date: Thu, 29 Nov 2007 08:54:56 +0100
On Wed, 2007-11-28 at 19:42 +0000, Hugo Mills wrote:
> Hi,
>
> I'm trying to use the SAX parser from libxml++ to read a simple XML
> file generated from a third-party program. At the head of the file is
> an XML declaration specifying the charset encoding:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
>
> A short distance into the file is the following text:
>
> <sub-title lang="en">Highlights of the final of the Grand Slam of Darts, played over the best of 35 legs. The winner will be crowned the inaugural champion and receive a cheque for £80,000. [S]</sub-title>
>
> (Just in case that's got mangled in transit, that's the
> entity/character literal 0xa3, for the UK Pound symbol in ISO-8859-1).
>
> When I pass this to libxml++, I get a Glib::Error thrown,
> complaining about "Invalid byte sequence in conversion input". It
> seems that libxml++ is reading the &#A3; and converting it to a byte,
> then trying to interpret that as UTF-8, which it isn't. I've tried
> converting the input chunk before I pass it to the parser (using
> Glib::convert), but obviously that isn't working, as it's processing
> the entity as its component characters, rather than converting it to a
> byte sequence.
What does xmllint say?
> How do I handle this input correctly with libxml++? Do I have to
> preprocess each chunk manually to convert the character entities
> before passing it to the parser, or is there some way of persuading
> the SaxParser to do it?
>
> Thanks,
> Hugo.
--
murrayc murrayc com
www.murrayc.com
www.openismus.com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]