[xslt] Processing of extension-elements/functions and user-defined data elements

From: "Buchcik, Kasimier" <k buchcik 4commerce de>
To: <xslt gnome org>
Subject: [xslt] Processing of extension-elements/functions and user-defined data elements
Date: Wed, 3 May 2006 17:56:01 +0200
Hi,

I think there is a flaw in the current mechanism of processing
extension element/functions and top-level (user-defined data)
elements.

(Sorry, this has probably gotten a bit too verbose)

Let me try to explain:
1) I don't think the [xsl:]extension-element-prefixes mechanism is
  intended and usefull for registration of extension modules.
  [xsl:]extension-element-prefixes is used just to distinguish
  extension elements from literal result elements in the stylesheet.
  Specifying [xsl:]extension-element-prefixes does not say that
  there will be actually any extension element in the stylesheet.

  Example A (in this stylesheet there's actually no extension element):
  <xsl:stylesheet ...>
    <xsl:template>
      <foo xmlns:ext="urn:test:foo"
xsl:extension-element-prefixes="ext">
        
      </foo>
    ... 

  Example B (we have an extension element here):
  <xsl:stylesheet ...>
    <xsl:template>
      <foo xmlns:ext="urn:test:foo"
xsl:extension-element-prefixes="ext">
        <ext:my-extension-element .../>
      </foo>
    ... 

  So there's no reason why in example B any extension element module
  should be initialized. I think this needs to happen on the first
  occurence of an extension element - not earlier.

2) Top-level elements are not "extension elements":
  http://www.w3.org/TR/xslt#extension-element
  Thus "extension-element-prefixes" does not apply to them.

Let me analyse an example (below) using EXSLT's definitions of
extension functions via "func:function":

1) The top-level data element func:function is correctly
  identified by Libxslt (since it was registered via
  xsltRegisterExtModuleTopLevel()).
  
2) The parsing of the content of the func:function incorrectly
  fails if there was no extension element namespace
  defined via "extension-element-prefixes":
  a) the xsl:param is rejected as invalid, since the
   grammar-checks in preproc.c allow xsl:param to
   occur as a parent of an extension element, but
   the namespace of func:function was not defined
   to be an extension element.
  b) if we remove the xsl:param, then the func:result
   element is incorrectly treated as a literal result
   element, which leads to an addition of
   nodes to the result-tree - which is forbidden by
   the definition of func:function. Thus Libxslt raises
   an error here.

Stylesheet:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:func="http://exslt.org/functions";
  xmlns:foo="urn:test:foo"
  exclude-result-prefixes="func foo">

  <func:function name="foo:func-1">
    <xsl:param name="param-1" select="'hello'"/>
    <func:result select="$param-1"/>
  </func:function>

  <xsl:template match="/">
	  <foo>
      <xsl:value-of select="foo:func-1()"/>
    </foo>
  </xsl:template>

</xsl:stylesheet>

This example is valid according to Saxon 6.5.3 and Xalan-J,
which both generate the following result:

<?xml version='1.0' ?>
<foo>hello</foo>

To solve this issue, we should change the following things:

1) The *backtracking* grammar checks are not suitable for
  defining the content of the element func:function.
  The definition of the function:
  <func:function
    name = QName>
    <-- Content: (xsl:param* | template) -->
  </func:function>
  So we need somehow to move the process of content-checking
  to the compilation function of func:function. Backtracking
  grammar checks won't ever cover the specific user-defined
  content model of an extension element or top-level data
  element.

2) Complete the implementation of the EXSLT spec:

  "The EXSLT - Functions namespace (http://exslt.org/functions) 
   is designated as an extension namespace within the subtree
   rooted at a func:function element. The effect of this is
   as if the func:function element had a
   xsl:extension-element-prefixes attribute defined on it,
   with one of the values within it being the prefix used
   for the EXSLT - Functions namespace".

  This means that we need to implicitely define the
  extension element namespace for the content of func:function.
  This way the func:result element will be correctly recognized
  as an extension element (without the need to define its
  namespace via "extension-element-prefixes" on the xsl:stylesheet).


An other variant of this issue reveals that Libxslt does currently
depend on using "extension-element-prefixes" if it comes to
overriding (via xsl:import) equally-named functions.

Example:

"func-3.xsl"
------------
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:func="http://exslt.org/functions";
  xmlns:foo="urn:test:foo">

  <xsl:import href="func-3-imp.xsl"/>

  <func:function name="foo:same-func"/>  

</xsl:stylesheet>

"func-3-imp.xsl"
---------------
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
  xmlns:func="http://exslt.org/functions";
  xmlns:foo="urn:test:foo">

  <func:function name="foo:same-func"/>

</xsl:stylesheet>

Result of xsltproc:

xsltproc func-3.xsl func-3-imp.xsl
compilation error: file func-2.xsl line 9 element function
Failed to register function {urn:test:foo}same-func

This should not happen, since func:function should be
"import predecence" aware, so the imported function should be
simply overriden. Note that we don't get an error if we give
the functions different names.

Here in exsltFuncFunctionComp(), the function xsltStyleGetExtData()
is called to retrieve a stored table with the function names.
If we don't use "extension-element-prefixes" then a distinct
table is created for every imported stylesheet, thus the
names do not clash.
If we *don't* use "extension-element-prefixes", then only one
table is created (on the imported stylesheet) and both functions
are tried to be put into that 1 table.

This effect is based on:
1) the current parsing order
2) a lookup mechanism of extension module information, which
  will query the import-tree in descending order for already
  registered module information.
  The relevant code for this lookup is in xsltStyleGetExtData():
  -----
  tmp = style;
  while (tmp != NULL) {
    if (tmp->extInfos != NULL) {
      data = (xsltExtDataPtr) xmlHashLookup(tmp->extInfos, URI);
      if (data != NULL)
        break;
    }
    tmp = xsltNextImport(tmp);
  }
  -----

Thus we get the following explanation of the effect:

1) "extension-element-prefixes" of the xsl:stylesheet is parsed
  in descending order of the import-tree.
  Thus: 
  a) Subsequent stylesheets are not created yet, so
    the import-tree lookup will not retrieve any already registered
    module information
  b) the table mentioned above (which is the module information)
    will be created for every imported stylesheet.

2) Top-level data elements are parsed in ascending-order of the
  import tree.
  Thus: 
  a) subsequent stylesheets which have already beed processed,
    will be queried for existing tables.
    Since the imported stylesheet has created such a table,
    the importing stylesheet will use it and will try to add
    its function - resulting in a name-clash.


I now realize why my extension functions were initialized
for every stylesheet as described in 
http://mail.gnome.org/archives/xslt/2006-May/msg00003.html :

1) I used "extension-element-prefixes"
2) the parsing process did not see that the module was
  already initialized, since the module-information-lookup
  is performed down the import-tree, but my already registered
  data was registered *up* the import-tree.

Summary:

1) func:function needs an *overriding* mechansim;
  the module information must be stored per stylesheet for this

2) My extension element or extension function does only need
  *one* place to store its module information;
  so initialization in needed only *once*.

The scenarios 1) and 2) have different semantics, but are
currently processed with the same mechanism, i.e., with
xsltStyleGetExtData().
I think we should provide two different mechanisms here.

I currently don't know what semantic is expected
for xsltStyleGetExtData() at transformation time.
  
Some results of poking at the code:

I removed the call to xsltStyleGetExtData() in
xsltRegisterExtPrefix(). This healed the
libxslt:test() function initialization (it was
called only once now). But on the other hand this
produced name-clashes for func:function (read above for
the reasons). xsltStyleGetExtData() is clearly not
able to create per-stylesheet information if the
modules are registered with a deep-first processing
of the import-tree. I don't see a way of fixing
the one without breaking the other.

I noticed that the import-tree lookup in
xsltStyleGetExtData() will only work as expected if
it is started with the main stylesheet, since it won't
see previous-sibling-imports:

main A
  |
  -- import B
  |
  -- import C - a lookup here won't see A and B
     |
     -- import D


I tend to think that we should solve this by introducing
a function which explicitely registers the module data
*per* stylesheet - to be used for user-defined data
elements: xsltStyleGetUserDefinedData(). Plus
eliminate the module-initialization in xsltRegisterExtPrefix().


Regards,

Kasimier
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]