Re: [xml] How do I get the encoding of an XML document?
- From: Daniel Veillard <veillard redhat com>
- To: Jean Jordaan <jean jordaan gmail com>
- Cc: xml gnome org
- Subject: Re: [xml] How do I get the encoding of an XML document?
- Date: Wed, 3 Jan 2007 07:32:44 -0500
On Wed, Jan 03, 2007 at 11:53:55AM +0200, Jean Jordaan wrote:
Hi there
I'd like to find the encoding of an XML document, as detected by
libxml2, using the Python bindings. From lxml, I can get it like this:
et
<etree._ElementTree object at 0xb7cc992c>
et.docinfo.encoding
'windows-1252'
According to the lxml API docs, lxml gets this information from libxml2 (see
http://codespeak.net/lxml/api.html#parsers )
How do I get at it without depending on lxml? The only way I've been
able to find is using debugDumpDocumentHead, which just prints to
stdout.
dh = xml.debugDumpDocumentHead(xml)
DOCUMENT
version=1.0
encoding=windows-1252
standalone=true
Hum, it's a string attached to the xmlDoc, it's available directly in C
but there is no specific API to extract it. As a result the autogenerated
bindings don't seems to have a way to extract the information. Could you
add a bugzilla asking for that functionality, the simplest is probably
to provide a custom accessor function, specifically at the python binding
level.
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]