Re: [xml] Scripting languages and character encodings
- From: Daniel Veillard <veillard redhat com>
- To: Malcolm Tredinnick <malcolm commsecure com au>
- Cc: Steve Ball <Steve Ball zveno com>, xml gnome org
- Subject: Re: [xml] Scripting languages and character encodings
- Date: Thu, 22 Jan 2004 02:01:33 -0500
On Thu, Jan 22, 2004 at 10:13:48AM +1100, Malcolm Tredinnick wrote:
Hi Stevem
My question is, can I tell the libxml2 parser that a document
has a certain character encoding, overriding what the XML
declaration says?
The short answer is "no". If you are changing the encoding of the data
in the document, then you need to change the encoding in the XML
declaration as well.
Actually it's more complex than that check appendix F of the spec
http://www.w3.org/TR/REC-xml#sec-guessing
the document encoding declaration (it's not an attribute) may be provided
by the context (like an HTTP header) and override the value possibly
found in the XML declaration.
The issue of overriding the encoding declaration has been discussed on
the list a few times previously and Daniel's position has always been
that passing in data that is not well-formed XML is not something that
libxml is going to handle (the XML specification requires as much). The
Hum, except for those contextual informations... I consider this a really
bad practice, but I have tried to adapt the APIs for this. For example
the new xmlReadxxx and xmlCtxtReadxxx APIs allow to take an argument
const char *encoding,
which may override the XML Decl encoding.
Now the problem is that I suspect this may not work well with UTF-8
for implementation reasons but it is worth a try.
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]