Re: [xslt] Replacing the dict used by the transform context



On Mon, Aug 07, 2006 at 11:28:33AM +0200, Stefan Behnel wrote:
> Salut Daniel,

 re,

> >> Now, I checked if this is actually true and found that the dict is not
> >> accessed anywhere in that function or in any of the functions it calls. So
> > 
> >   The question is how did you check ?
> 
> Well, I looked through the function and then looked through all functions it
> called to see if any of them contains the word "dict". And they don't. See
> below on why that wasn't enough.

  
> 
> > I think this comment was added when
> > dealing with extension functions used in keys. Now whether this was about doing
> > it in that function (which is early on transformation wise) or early in the
> > function (which is what you seems to have concluded) is unclear to me.
> 
> I assume you mean: "early in the function" vs. "early in the transformation
> process".
> 
> At least, it's not in the function, but still, the comment is just completely
> ambiguous (or maybe just outdated like - uhm - some others). A misleading
> comment is worse than none.

  I assume the comment was unambiguous and spot on to the person who added
it at the time. The dictionnary myst be set up early because key may use them !

> >   In general replacing the dict with a random new one may simply break the
> > processor, there is a number of places where string comparison have been
> > replaced with pointer comparison because we knew that a string found in the
> > parent dict would be the same as the string if queried in the subdict.
> 
> That's good to know. So, what does that mean? Currently, libxslt generated
> documents inherit the dictionary of the stylesheet. You say that it is
> required that the transform dictionary can access the entries in the
> stylesheet dictionary?

those are libxslt general rules. You may break them without trouble or not
depending on the context of use. But they are strict guidelines when it come
to libxslt own code:
 - transformation dictionnary must be derived from the stylesheet dict
 - stylesheet dict must be read only at transform time
 - the generated document should probably reuse the transformation dict

I think in general libxslt tries to cope when this is not the case, but
it's very hard to garantee non-pathological behaviour is those expectations
are not met, for example it's nearly garanteed that processing will be slower,
potentially quite slower !

> 
> > In
> > your approach in lxml it will still work because it is the same dictionnary.
> 
> Not necessarily. We use per-thread dictionaries, so according to your above
> quote, sharing an XSLT between threads will not work if we use the
> thread-local dict for the transformation, right?

  I don't know when and how you took that decision, I cannot garantee you
will have a perfectly working solution in the general case that way.

> I tried that with a simple XSLT and it worked for me, so maybe you could try
> to explain in a little more detail in what case this will not work.

  Basically there is many places now in libxslt where string comparison
are replaced with pointer compare knowing that string will be shared between
various dictionnaries involved. 

> Is it also
> related to extension functions (which I did not try)?

  extension functions may also access the dictionnaries.

> 
> > However the big problem of your approach as I explained to Martijn when he
> > started lxml is that as a result the dictionnary can never be freed and 
> > cannot be shrinked either, so you have a ever balooning resource which will
> > exhaust memory after a while no matter what.
> 
> Sure, if the parsed XML is sufficiently diverse. However, as long as you stick
> to a small set of XML languages (which is the most common use case, I'd say),
> you should not run into too much trouble, right?

  No because you can have generated names see xsl:element for example

> >    The main question here is about extension functions, their initialization
> > passes the transformation context, which itself includes a pointer to the
> > dictionnary. So the dictionnary is available within the course of the
> > execution of that function to extension functions.
> 
> Ok, after a little thinking and code reading, I managed to figure out what
> your last paragraph was supposed to mean. You were referring to the module
> extension initialisation in "xsltInitCtxtExt", right? Ok, we don't have that
> problem in lxml.

  Right, okay.

> > I guess that's where
> > your hope of never touching the dictionnary at that point breaks, of course
> > you need a very specific stylesheet and environment but people do very crazy
> > things with extension functions.
> 
> Sure, replacing the dictionary is only meant for people "who know what they
> are doing". That's why I'm trying to fix certain semantics for this case, to
> actually allow people to know what impact this has.

  okay

> 
> > And I cannot garantee:
> >    - that your change would not break an existing deployed use of libxslt
> >      the dictionnary may be accessed by extension function initialization.
> 
> Ok, I understand that now, so my patch is not worth including.

  okay

> >    - that if not changing it you may never face a case where some stylesheet
> >      actually access that dictionnary during the initialization phase.
> 
> All I'd like to have is some official documentation saying:
> 
> * You should not normally replace the transform dictionary.
> * If you do, you must take care on your own that the extension module setup
>   you use does not alter the dictionary. Apart from that, it's safe and will
>   continue to be safe in later versions.

  I would like to offer a garantee, but sorry, anything which is not part of
the function API/ABI is too hard to document and garantee over long term.
The library is open to hacking from clients, but I can only offer limited 
garantees, sorry.

> > However when xsltNewTransformContext returns you should be able to check
> > if that dictionnary had been used by comparing xmlDictSize() of style->dict
> > and xmlDictSize() of cur->dict. If they are the same, then no string has
> > been allocated from that subdict, the only risk would be a dandling pointer
> > to it in some extension function structures.
> 
> Well, but if it's not the same size, we end up having the same problem as
> before.

  Abort the tranformation with an error instead of silently breaking up.
First is quality grade software, second is something nobody should rely on ;-)

> Sure, one could raise an exception telling the user that what she did
> back then, when registering that long forgotten extension function setup, was
> a no-no. But then, that's exposing implementation details (much better than a
> crash, but well)...
> 
> Anyway, as I said, we don't have that problem in lxml.

  okay

> I think so, too. That's why I would like to have an official blessing that
> this will continue to work in later versions.

  Sorry, it's too hard to garantee, I don't expect to break it, but there is
way too many complex things which can happen during that function that I cannot
give a blanket statement like you would like, and the extension problem is
a good example IMHO showing really why I could not offer such a statement in the
general case.

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]