Re: [xml] Scripting languages and character encodings
- From: Steve Ball <Steve Ball zveno com>
- To: Malcolm Tredinnick <malcolm commsecure com au>
- Cc: xml gnome org
- Subject: Re: [xml] Scripting languages and character encodings
- Date: Thu, 22 Jan 2004 14:41:01 +1100
Damn! A thread on this issue only a week ago... and I missed it!
Sorry folks for wasting bandwidth :-(
I did have a preference to not edit the document data, but also to
treat it as text rather than as a binary. Looks like I will need to
think through the issues some more.
Cheers, and many thanks for the pointer,
On 22/01/2004, at 10:13, Malcolm Tredinnick wrote:
On Thu, 2004-01-22 at 08:15, Steve Ball wrote:
My question is, can I tell the libxml2 parser that a document
has a certain character encoding, overriding what the XML
The short answer is "no". If you are changing the encoding of the data
in the document, then you need to change the encoding in the XML
declaration as well.
This shouldn't be a particularly onerous pre-parsing step, since it is
one attribute in the first line of the file.
The issue of overriding the encoding declaration has been discussed on
the list a few times previously and Daniel's position has always been
that passing in data that is not well-formed XML is not something that
libxml is going to handle (the XML specification requires as much). The
most recent thread on this is titled "Control over encoding declaration
(prolog and meta)" that started on 14 Jan 2004 (check the online
archives). Obviously you don't want to do the re-encoding idea
in that thread, since then you go from ISO-8859-1 -> UTF-8 ->
-> libxml which is less than efficient, but the justifications may be
interest to you.
Steve Ball | XSLT Standard Library | Training & Seminars
Zveno Pty Ltd | Web Tcl Complete | XML XSL Schemas
http://www.zveno.com/ | TclXML TclDOM | Tcl, Web Development
Steve Ball zveno com +---------------------------+---------------------
Ph. +61 2 6242 4099 | Mobile (0413) 594 462 | Fax +61 2 6242 4099
] [Thread Prev