Re: [xml] Ignoring Character Encodings



On Thu, Apr 11, 2002 at 01:35:30PM +0100, Cyberthymia wrote:
From: "Daniel Veillard" <veillard redhat com>
  Right, if the document declares to be in ISO-8859-1, but is actually
in UTF-8 that's a fatal error, your document is not XML. You need to
fix it before handing it to the parser (well this can be argued upon
in the case where the framework provides the encoding, like when using
HTTP encoding informations associated with the Content-Type).
  But I consider libxml2 rejecting document whose declared encoding
does not match the actual one to be a feature, not an error.


I understand - I was just hoping for a "We've already done the encoding
for you" type of option. The situation occurs because the user can load the
document into our app, during which we have to encode it to our internal
format (otherwise all of our existing functionality falls apart). They can
then
edit the doc inside the app before asking for it to be parsed. As such, they
still need to specify the encoding declaration in the doc so we know what
we need to do to load it, and can save it in the right format. (The original
document before we load it really will be ISO-8859-1, and will remain so
if they save the document back out post-parsing)

So the correct thing I'd need to do is to either remove (hide) the encoding
declaration or to re-encode the doc back into its original format before
feeding it to the parser?

  Well if you know that documents will be ISO-8859-1 and already converted
to UTF8 you could override the default converters for this encoding by
supplying basically identity converters. 
   Defined on the same page I pointed you to in the previous mail
     http://xmlsoft.org/encoding.html

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]