Re: [xslt] Replacing the dict used by the transform context



On Sun, Aug 06, 2006 at 10:51:54PM +0200, Stefan Behnel wrote:
> Hi,
> 
> I'm the maintainer of lxml, a pythonic Python binding to the libxml2/libxslt
> libraries.
> 
> In lxml, we use a global per-thread dictionary for all parsers. Recently, we
> had reports about crashes in the XSLT code when moving subtrees of results to
> other trees. The reason for this is that libxslt creates a new sub-dictionary
> for each transformation context that is inherited by the result document. This
> means that the dict disappears when the document is freed. The cleanup
> procedure for this specific document and all of its elements therefore depends
> on the source of the original document, and it's difficult to deal with this
> in generic cleanup code (as required by a wrapper like lxml).
> 
> We would like to use the same approach as for parsers here, i.e. replace the
> dict after creating the context. However, in the xsltNewTransformContext
> function in transform.c it says right before the creation of the sub-dictionary:
> 
>    /*
>     * setup of the dictionnary must be done early as some of the
>     * processing later like key handling may need it.
>     */
> 
> Now, I checked if this is actually true and found that the dict is not
> accessed anywhere in that function or in any of the functions it calls. So

  The question is how did you checked ? I think this comment was added when
dealing with extension functions used in keys. Now whether this was about doing
it in that function (which is early on transformation wise) or early in the
function (which is what you seems to have concluded) is unclear to me.

> this comment is at least misleading.
> 
> I do not consider it common usage to replace the dict, so I would not request
> an explicit API for this. However, for the case we need, it is sufficient to
> free the dict that was created and replace it be the 'right' one.

  In general replacing the dict with a random new one may simply break the
processor, there is a number of places where string comparison have been 
replaced with pointer comparison because we knew that a string found in the
parent dict would be the same as the string if queried in the subdict. In
your approach in lxml it will still work because it is the same dictionnary.
However the big problem of your approach as I explained to Martijn when he
started lxml is that as a result the dictionnary can never be freed and 
cannot be shrinked either, so you have a ever balooning resource which will
exhaust memory after a while no matter what.

> I would like to propose making it a feature that the dictionary is *not*
> changed in the xsltNewTransformContext, but can be safely replaced in user
> code right after returning the new context. The attached patch does this. It
> does not change the behaviour of the current libxslt at all, it mainly moves
> the dict creation to the end of the function and adds a comment that it is
> supposed to stay there to prevent any updates before returning the context. I
> ran "make tests" with it and the tests passed nicely.

   The main question here is about extension functions, their initialization
passes the transformation context, which itself includes a pointer to the
dictionnary. So the dictionnary is available whithin the course of the
execution of that function to extension functions. I guess that's where
your hope of never touching the dictionnary at that point breaks, of course
you need a very specific stylesheet and environment but people do very crazy
things with extension functions. And I cannot garantee:
   - that your change would not break an existing deployed use of libxslt
     the dictionnary may be accessed by extension function initialization.
   - that if not changing it you may never face a case where some stylesheet
     actually access that dictionnary during the initialization phase.

 However when xsltNewTransformContext returns you should be able to check
if that dictionnary had been used by comparing xmlDictSize() of style->dict
and xmlDictSize() of cur->dict. If they are the same, then no string has
been allocated from that subdict, the only risk would be a dandling pointer
to it in some extension function structures.
 But maybe that discussion about extension function is moot anyway because in
lxml context you completely control those extensions and in that case access
to the dictionary at initialization is impossible, and I guess you would be
safe running with the current existing code.

   Yes this is rather complex,

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]