On Wed, Nov 28, 2007 at 07:42:38PM +0000, Hugo Mills wrote: > When I pass this to libxml++, I get a Glib::Error thrown, > complaining about "Invalid byte sequence in conversion input". It > seems that libxml++ is reading the &#A3; and converting it to a byte, > then trying to interpret that as UTF-8, which it isn't. I've tried > converting the input chunk before I pass it to the parser (using > Glib::convert), but obviously that isn't working, as it's processing > the entity as its component characters, rather than converting it to a > byte sequence. > > How do I handle this input correctly with libxml++? Do I have to > preprocess each chunk manually to convert the character entities > before passing it to the parser, or is there some way of persuading > the SaxParser to do it? As a follow-up, I have tried converting the character entities in two different ways, both failing in the same manner as above: 1) Convert entity to bytes; use Glib::convert to go from ISO-8859-1 to UTF8. 2) Convert entity to bytes; use Glib::convert to go from ISO-8859-1 to UTF8; convert new bytes back to entities. Surely this can't be so difficult to use. The input text is well-formed, and accurately reports its character set. What am I doing wrong, that libxml++ fails to cope with it? Hugo, getting frustrated. -- === Hugo Mills: hugo carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Is it true that "last known good" on Windows XP --- boots into CP/M?
Attachment:
signature.asc
Description: Digital signature