RE: [xslt] Changes of internal structures



Hi Frans, 

> -----Original Message-----
> From: Frans Englich [mailto:frans englich telia com] 

[...]

> > This leads to the possibility 
> > of using the stylesheet's tree as an input tree; i.e., allows
> > document("")
> > to work correctly.
> 
> Couldn't it be useful to in the case of where the stylesheet 
> is needed as an 
> input document(that is, 'document("")'), to then simply load 
> the stylesheet 
> as a regular input document. My point is that the stylesheets 
> which processes 
> themselves are very rare(imho), and therefore the 
> implementation can do 
> compilation approaches that gains the vast majority of cases, 
> at the cost of 
> loading the xsl file in the rare cases it's needed. Also, 
> document("") isn't 
> as bad as actually compiling a stylesheet since it doesn't process 
> xsl:include/import.
> 
> Would this break any stability constraints? Or is there 
> anything else which 
> would make it a bad idea?

This is actually the current behaviour of Libxslt. It will
process the stylesheet as a normal document; i.e., parse it again:

xsltDocumentFunction() (in libxslt/functions.c):
  Here the base-URI of the stylesheet's doc is queried:

  base = xmlNodeGetBase(tctxt->inst->doc, tctxt->inst);

  Then the absolute URI is built. Since the given URI is the
  empty string, the absolute URI will be the base URI of the
  stylesheet:

  URI = xmlBuildURI(obj->stringval, base);

  Then the stylesheet document is parsed:

  xsltDocumentFunctionLoadDocument( ctxt, URI );

My previous concern was directed to the possibility of
using the document('') function for stylesheet trees, which
were constructed via the API or parsed from an in-memory
representation. In such cases, the stylesheet tree won't have
a base URI normally, thus document('') won't work.

Your idea follows the definition of the XSLT 2.0 spec:

(I posted something similar at lxml-dev; the text below is copied
from that mail at
http://codespeak.net/pipermail/lxml-dev/2006-April/001089.html)

-----
I learned that the spec of XSLT 2.0 clarifies the semantics
of the document() function (which, as I was told, was introduced
in an abandoned draft of XSLT 1.1 and never made it into the
recommendation):
http://www.w3.org/TR/xslt20/#stylesheet-stripping

"One effect of these rules is that unless XML entities or xml:base are
used,
and provided that the base URI of the stylesheet module is known,
document("") refers to the document node of the containing stylesheet
module
(the definitive rules are in [RFC3986]). The XML resource containing the
stylesheet module is processed exactly as if it were any other XML
document,
for example there is no special recognition of xsl:text elements, and no
special treatment of comments and processing instructions."
(http://www.w3.org/TR/xslt20/#document)
-----

The important part:
"resource containing the stylesheet module is processed
 exactly as if it were any other XML document"

One can tweak Libxslt's mechanism by setting a fake base URI on
the constructed stylesheet tree and by providing custom I/O handling
to get a grip on our custom "resource containing the stylesheet"
and return the document node of any tree we like.

But we need also to satisfy the rule: "processed exactly as if it
were any other XML document".

It is possible to satisfy this rule if the stylesheet tree was
built from an in-memory string: one can keep the string, parse it
again and hand over the result to Libxslt.

But it is not easily done for stylesheet trees built via
an API (e.g. DOM). One could serialize the tree before validation -
a rather inefficient workaround if document('') is rarely used.

So, for me, the conclusion is that it's better handled ony the user's
side; the spec clearly rules out scenarios where there's no
base URI available.

Summary of scenarios:

1) If the user has an environment, where stylesheets are constructed
   via an API (i.e., no XML document source available):

 a) If document('') must be supported:
    He has to provide a base URI and serialize the tree; either
    simply to file, or tweak I/O to use an in-memory representation.

 b) If document('') won't be supported:
    Return an error.
    The reasons might be that there's agreement that document('') is not
    allowed to be used, or the fact that - as you already mentioned -
    it is a rather rare case.
 
2) If the user has an in-memory representation:

  a) same as 1a)
  b) same as 2a)

3) If the stylesheet was parsed from an XML document then everything's
   fine.

Regards,

Kasimier


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]