Re: [libxml++] Charset conversion error -- ignoring encoding declaration?



On Wed, Nov 28, 2007 at 07:42:38PM +0000, Hugo Mills wrote:
>    When I pass this to libxml++, I get a Glib::Error thrown,
> complaining about "Invalid byte sequence in conversion input". It
> seems that libxml++ is reading the &#A3; and converting it to a byte,
> then trying to interpret that as UTF-8, which it isn't. I've tried
> converting the input chunk before I pass it to the parser (using
> Glib::convert), but obviously that isn't working, as it's processing
> the entity as its component characters, rather than converting it to a
> byte sequence.
> 
>    How do I handle this input correctly with libxml++? Do I have to
> preprocess each chunk manually to convert the character entities
> before passing it to the parser, or is there some way of persuading
> the SaxParser to do it?

   As a follow-up, I have tried converting the character entities in
two different ways, both failing in the same manner as above:

1) Convert entity to bytes; use Glib::convert to go from ISO-8859-1 to
UTF8.

2) Convert entity to bytes; use Glib::convert to go from ISO-8859-1 to
UTF8; convert new bytes back to entities.

   Surely this can't be so difficult to use. The input text is
well-formed, and accurately reports its character set. What am I doing
wrong, that libxml++ fails to cope with it?

   Hugo, getting frustrated.

-- 
=== Hugo Mills: hugo     carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
         --- Is it true that "last known good" on Windows XP ---         
                            boots into CP/M?                             

Attachment: signature.asc
Description: Digital signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]