Re: [xml] How do I get the encoding of an XML document?



On Wed, Jan 03, 2007 at 11:53:55AM +0200, Jean Jordaan wrote:
Hi there

I'd like to find the encoding of an XML document, as detected by
libxml2, using the Python bindings. From lxml, I can get it like this:

et
<etree._ElementTree object at 0xb7cc992c>
et.docinfo.encoding
'windows-1252'

According to the lxml API docs, lxml gets this information from libxml2 (see
http://codespeak.net/lxml/api.html#parsers )

How do I get at it without depending on lxml? The only way I've been
able to find is using debugDumpDocumentHead, which just prints to
stdout.

dh = xml.debugDumpDocumentHead(xml)
DOCUMENT
version=1.0
encoding=windows-1252
standalone=true

  Hum, it's a string attached to the xmlDoc, it's available directly in C
but there is no specific API to extract it. As a result the autogenerated
bindings don't seems to have a way to extract the information. Could you
add a bugzilla asking for that functionality, the simplest is probably
to provide a custom accessor function, specifically at the python binding
level.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]