[sorry if this message appears twice on the list - I wasn't subscribed when I sent the first one, so the message apparently didn't get through]
Hi libxml2 developers,
I am implementing a simple XSLT processor that would operate on the data right out of Subversion repository. That is, it needs to handle svn:// URIs. I am implementing it in Python.
For that purpose, the libxml2.setEntityLoader almost works. The problem is that the catalog itself is not loaded using the external entity loader - thus bypassing the supplied handler for the svn:// URIs. If I use a catalog on a local filesystem, the locations are resolved relative to that catalog's location (as per specification). If I use a catalog on a local filesystem which rewrites public/system IDs into svn:// URIs, these URIs go directly into xmlIO layer - again, bypassing entity loader - and xmlIO cannot handle svn:// URIs (only http:// and ftp://).
It is possible to make xmlIO handle any protocol by means of xmlRegisterInputCallback(). However, that function is currently only available in C API. So, the natural solution seems to be implementing Python bindings for the xmlRegisterInputCallback. The attached patch implements such bindings.
If there are no objections to the patch, I could augment it with test cases for these bindings.
Also, attached patch fixes a few problems with setEntityLoader:
1. Setting entity loader does not increment the refcount on the Python object passed in. This works only if the object is not deleted. For example, the following code results in segmentation fault in Python interpreter when attempting to process any document:
[[[ def register_entity_loader(): def entity_loader(URL, ID, ctxt): ... libxml2.setEntityLoader(entity_loader
register_entity_loader() ]]]
2. setEntityLoader() does not verify if the passed object is callable. If it is not, current implementation attempts to call it anyway and failing that, silently moves on to default entity loader. Attached patch makes setEntityLoader raise ValueError exception if non-callable object is passed.
3. In debug mode, pythonExternalEntityLoader() outputs the result object to stderr, while the messages before and after the object (description + newline) go to stdout. Attached patch makes them all go to stdout.
Regards, Alexey. |
Attachment:
register-input-callbacks.diff
Description: Text Data