Re: [xml] Scripting languages and character encodings



Damn!  A thread on this issue only a week ago... and I missed it!
Sorry folks for wasting bandwidth :-(

I did have a preference to not edit the document data, but also to
treat it as text rather than as a binary.  Looks like I will need to
think through the issues some more.

Cheers, and many thanks for the pointer,
Steve Ball

On 22/01/2004, at 10:13, Malcolm Tredinnick wrote:
On Thu, 2004-01-22 at 08:15, Steve Ball wrote:
My question is, can I tell the libxml2 parser that a document
has a certain character encoding, overriding what the XML
declaration says?

The short answer is "no". If you are changing the encoding of the data
in the document, then you need to change the encoding in the XML
declaration as well.

This shouldn't be a particularly onerous pre-parsing step, since it is
one attribute in the first line of the file.

The issue of overriding the encoding declaration has been discussed on
the list a few times previously and Daniel's position has always been
that passing in data that is not well-formed XML is not something that
libxml is going to handle (the XML specification requires as much). The
most recent thread on this is titled "Control over encoding declaration
(prolog and meta)" that started on 14 Jan 2004 (check the online
archives). Obviously you don't want to do the re-encoding idea mentioned in that thread, since then you go from ISO-8859-1 -> UTF-8 -> ISO-8859-1 -> libxml which is less than efficient, but the justifications may be of
interest to you.



Steve Ball            |   XSLT Standard Library   | Training & Seminars
Zveno Pty Ltd         |     Web Tcl Complete      |   XML XSL Schemas
http://www.zveno.com/ |      TclXML TclDOM        | Tcl, Web Development
Steve Ball zveno com  +---------------------------+---------------------
Ph. +61 2 6242 4099   |   Mobile (0413) 594 462   | Fax +61 2 6242 4099




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]