Re: [xml] Ignoring Character Encodings



On Thu, Apr 11, 2002 at 12:49:17PM +0100, Richard Jinks wrote:
From: "Daniel Veillard" <veillard redhat com>
Ah. I think I might not have explained myself properly. I'm concerned with
the
parsing of the document, not with the subsequent output.

  Okay then I misunderstood. Seems common these days, I'm tired for
a lot of reasons.

I'm feeding the document in through the push parser in large chunks, but as
libxml gets to the encoding declaration, it suddenly switches encoding on me
and starts reporting errors that the UTF-8 chars I'm giving it don't match
the
ISO-8859-1 (for example) encoding the document says it's in, and used to
be in before I got to it. Subsequently, the errors from the libxml encoder
causes
the parse to fail.

  Right, if the document declares to be in ISO-8859-1, but is actually
in UTF-8 that's a fatal error, your document is not XML. You need to
fix it before handing it to the parser (well this can be argued upon
in the case where the framework provides the encoding, like when using
HTTP encoding informations associated with the Content-Type).
  But I consider libxml2 rejecting document whose declared encoding
does not match the actual one to be a feature, not an error.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]