Re: [xml] libxml2, howto force encoding



On Sun, Mar 23, 2003 at 02:50:45PM -0700, John Fleck wrote:
On Sun, 2003-03-23 at 05:33, Matthias Knöfel wrote:
Is there a possibility to force a special encoding while parsing a xml file?
I have to parse files with apparently false encoding declaration :(
They are declared as <?xml version="1.0" encoding="UTF-8"?> but includes
characters from the iso-8859-1 charset. So the xmlParseFile() function
will abort of course. Any ideas to parse these files without modification
of the xml-files?


So are you saying you're stuck with files that are not valid XML
(because their encoding declaration is incorrect) and you still want to
be able to deal with them using libxml?

  Well, unfortunately the XML spec says that the encoding in the
XML declaration may be overriden by other metadata indications.
I personally think this is a serious mistake, but in some case this
is allowed however.
  Now to go back to the original question, yes I think this is 
possible but it's not easy. You must create a parser input 
indicating the encoding to use (xmlIO.h), create a parser context,
use that parser input to that parsed context and then ask to parse
the document. Not trivial, look at xmlCreateFileParserCtxt() code
to see how to proceed except you have to give an encoding when
creating the input stream.
  I may be easier to simply transcode with iconv the input you know
declares a wrong encoding.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]