[xml] entity resolver callback context not too useful for python



Just a general comment and a description about how I'm working around 
the problem.

I have a multi-threaded Python webserver that serves up .xml and .xslt 
files and I let the client perform the transform from xml to html.

Sometimes, older browsers connect, so I need to do the transform on the 
server.

The transform needs to take place within the context of the "logged in 
user" because the transform may do lots of document() calls to get extra 
data from the server, and the results of those document calls depends on 
the logged in user.

Where  "logged in" is something established by the current http request, 
like a cookie.

Anyway, client-side document() requests come back to the server and 
they contain the necessary session cookie to establish the appropriate 
login context.

But when I perform the transform on the server, I need to associate entity 
resolution with the "spawning .xml file".

Suppose I use xmlParseDoc to load the .xml file and the stylesheet, and 
then I applyStylesheet.

At this point, I have two Python xmlDoc objects.

Somehow, when the entity resolver gets called and hands me a parsing 
context, I have to find the original xmlDoc object representing the .xml or 
.xslt file.

But, xmlDoc doesn't seem to have a parsing context

http://xmlsoft.org/html/libxml-tree.html#xmlDoc

Or at least, I can't see anyway to get at it from Python.


And, while a parsing Context does seem to have a current document (I 
guess, the one it's building), that's not very useful.

What appears to happen when processing document() during the 
transform:

1. libxml2 creates a new parsing context, 

2. libxml2 calls the entity resolver with that new context

3. libxml2 parses the returned data

But since this is a new parsing context, I've never seen it before, so even 
if I could find out my original xml/xslt document's parsing context, they  
wouldn't be helpful because they're not related to the freshly created 
context made during entity resolution.

It's a bit of chicken-and-egg problem, because loading the original xsl file 
might also do entity lookups (xsl:include), so even before I get a 
document handle, I could be doing lookups.


My hack-around solution was to use xmlReadDoc, and pass in a 
mangled URI with my own custom scheme, whose netloc can be 
dereferenced back to the originating 'http request context'.

It'd be handy of something like xmlReadDoc could be given some opaque 
value that would be passed to the entity loader. Further, that opaque 
value would have to be set on all documents loaded during the parsing 
process.

I think that'd take a lot of work to implement...

Anyway, just some thoughts.


-- 
Brad Clements,                bkc murkworks com    (315)268-1000
http://www.murkworks.com                          
AOL-IM or SKYPE: BKClements





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]