Re: [xml] Support for Python
- From: Daniel Veillard <veillard redhat com>
- To: Dave Kuhlman <dkuhlman cutter rexx com>
- Cc: xml gnome org
- Subject: Re: [xml] Support for Python
- Date: Fri, 25 Jan 2002 10:42:36 -0500
On Thu, Jan 24, 2002 at 04:57:39PM -0800, Dave Kuhlman wrote:
I've been thinking about ways that libxml (and libxslt) can be used
to provide XML support for the Python programming language.
Right, me too :-)
We should consider providing support in the following areas:
* Support for the DOM interface built on libxml (or gdome?).
I would split it into:
- Support for the tree interface built on libxml
- Support for the DOM2 api built on gdome2
* Support for the SAX interface built on libxml.
* Support for XSLT built on libxslt. (Possibly a discussion for the
Yep, let's keep XSLt separate for a bit.
Support for DOM
I've build DOM support for Python by hand, i.e. manually written
wrapper functions types, etc that expose libxml's DOM support.
(Avaliable at http://www.rexx.com/~dkuhlman.) But it's weak.
I've also used SWIG to generate wrappers for the libxml DOM support.
Basically, I generated wrappers for the stuff in include/libxml/tree.h
and include/libxml/parser.h. It works "pretty" good. A bit of
I'm tempted to go through a similar autogeneration of stub like SWIG
but would prefer to generate the Python glue directly from the XML formal
* Because I used SWIG's shadow classes, the doc, nodes, and
attributes look, from Python's point of view like instances of
classes. So, walking the DOM tree is very easy and natural.
Sounds a good idea, I need to look at the generated code. Anover way is to
make minimal wrappers and build more object oriented classes on top of
the raw function, defining the classes at the Python level.
* One benefit of doing this -- The Python objects (xmlDoc, xmlNode,
xmlAttr) are proxies for the "real", underlying (libxml) C objects
and the linkages between objects are in the underlying C objects.
Therefore, this implementation does not suffer from the problems
caused by circular references in Python objects. (Note that I
Okay point to check in any solution.
* More over, the Python objects are created and destroyed on the fly
and only on request. For example,
node = node.children
node = node.next
This code creates two nodes. Furthermore, when the value of
variable 'node' is over-written (and if there is no other
reference to that value), the Python object is destroyed. (For
Okay, I expect this from any implementation.
* One qualification is that the interface is at the level of the
libxml, so it's a bit low level. For example, a long running
application would have to call a 'free' method, e.g. xmlFreeDoc,
which is not something a Python programmer would expect to have to
Hum, keeping reference counting for xmlDocPtr is nearly impossible,
I doubt there is a workaround. Well this need more thinking, the idea
of having to call a doc.free() at the end of the processing doesn't sound
* For another qualification is that this implementation needs some
fix-up, because there are some kinds nodes in the tree that can
cause segment faults.
A python wrapper class sounds better to deal with an unified abstraction
of all the kind of nodes.
* And, the generated code is a bit large. I'm not sure that this is
a concern in a world where disk space is so cheap. It's possible
I don't really care. Developpers only will have the generated stubs somewhere
only the size of the object shared library _libxmlmodule.so would be really
gdome -- Whoa. I thought the DOM support was in libxml. I'll have to
No, it's DOM like but not DOM.
look into gdome. Can someone enlighten me on the relationship between
gdome and libxml DOM support. Does gdome support a newer version of
the DOM spec? Should we build DOM support for Python on top of libxml
or on top of gdome?
Yes but first implement a tree support at the libxml level.
Summary -- I'll continue to work on the SWIG wrappers for the libxml
DOM interface. I'll try to fix a few problems that I've found and will
look into generating support for encodings, catalogs, and entities.
I'm not 100% sure that I want to go the SWIG way. I will look first at
the way the GTK python wrappers have been done and work from there.
Support for SAX
* Ease of use -- I've used it quite a bit and it seems quite easy to
use and usable. It's a trivial task to create a Python handler
class with methods like 'startElment', 'endElement', 'characters',
etc and then do the parse to catch those events.
SAX support should be close to trivial. I expect something similar
to your current interface but also allowing a compatible use with
the xmllib and sgmlop interface where the callbacks are just
Should be trivial to be able to handle both and would allow very easilly
migration of existing code,
Creating a parser driver for PyXML built on libxml seems like a very
good idea. There are several benefits to be gained from doing so:
Since I don't know PyXML, I will abstain from commenting on this ATM.
seems this should be trivially implementable with just glue python code
on top of the sAx interface.
Isn't Python2 internationalization layer based on UTF16 strings ?
libxml/libxslt uses UTF8 . Is there any gain of trying to follow the
Python2 conventions ? Is there any risk by staying with UTF8 strings
seens as usual python strings ?
Converting to/from UTF16 all the time would be a killer.
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
] [Thread Prev