[xslt] Global state and modular softwares

From: Carlo Contavalli <ccontavalli inscatolati net>
To: xslt gnome org
Subject: [xslt] Global state and modular softwares
Date: Tue, 26 Aug 2008 18:28:22 +0200
Hello,
  following up a chat with Daniel on #xml irc.gnome.org, for the next version
of mod-xslt (www.mod-xslt2.com, apache module for xslt transformations) I'd
like to have:
  1 - a fix for a couple long standing bugs/bad interactions
  2 - in memory caching of parsed xml files / xslt stylesheets

1 - are mainly caused by global variables and things like xmlGlobalState.
  Basically, mod-xslt needs to modify xmlGenericError, xsltGenericError
  (and contexts - to provide helpful messages in logs or error pages),
  *GenericDebugFunc, uses xsltRegisterExtModule to add a couple extensions
  (to access some of the internal variables of mod-xslt & the underlying
  web server), enables exslt, ...

  However, with apache, in the same process there can be multiple
  libxml/libxslt users: php, mod_perl, ... that change the global
  defaults or specify their own error / debug handlers, leading
  to unpredictable results in mod-xslt.

  In the past, I patched libxml to expose functions to save and restore
  the xmlGlobalState by replacing the global pointer. I'd like
  to find a solution that doesn't require patching and addresses the
  problem in libxslt as well. Any idea? Any chance to see libxml/libxslt
  not use global variables &| expose an API to allow saving/restoring
  them? (or do you have any better solution with current API?)

2 - to perform caching, I'd need to:
    - keep an in-memory representation of the xslt stylesheet
    - and original xml document
    and make sure that I can re-use both (possibly from different
    thread of execution) to re-generate the output as necessary.
    Is this possible? do I need to serialize access to the xslt/xml
    documents? is the access done by transformations read-only?

    Other interesting problems:
      - with apache 1.3 or with prefork mpm... I'd love to have
        the parsed documents in a cache shared by multiple processes.
	I could change xmlMalloc to use shm or mmapped memory areas...
	but... I would incur in problem 1 again, and doesn't seem easy.
      - to determine when to reload a xslt stylesheet / document,
        I'd need to either have a TTL or check if the underlying files
	have changed. However, if I have an xsl stylesheet including 
	another stylesheet, it's hard to detect when dependency have
	changed (there are also DTDs in between, and things opened 
	with open/document).
	So: I can either ignore the problem, use a TTL based approach,
	use some sort of heuristic, or track dependencies correctly.
	Haven't decide which way to go yet, and don't know if there's
	an easy way to ask libxslt 'what are all the dependencies of
	this file?' (probably some custom hook on opens / url parsing?
	is it worth?)
      - well, when a stylesheet is applied, it would be great to know
        if there's any 'non-pure' (by definition of pure function)
	transformation going on (eg, 'can I cache the result of the
	transformation? or do I have to perform the transformation every
	time the page is accessed?') when an xslt is applied, but this
	is even harder :)

Any suggestion is welcome,

Thanks,
Cheers,
Carlo

-- 
  GPG Fingerprint: 2383 7B14 4D08 53A4 2C1A CA29 9E98 5431 1A68 6975
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]