Re: [xslt] Replacing the dict used by the transform context



On Mon, Aug 07, 2006 at 02:46:14PM +0200, Stefan Behnel wrote:
> > those are libxslt general rules. You may break them without trouble or not
> > depending on the context of use. But they are strict guidelines when it come
> > to libxslt own code:
> >  - transformation dictionnary must be derived from the stylesheet dict
> >  - stylesheet dict must be read only at transform time
> >  - the generated document should probably reuse the transformation dict
> > 
> > I think in general libxslt tries to cope when this is not the case, but
> > it's very hard to garantee non-pathological behaviour if those expectations
> > are not met, for example it's nearly garanteed that processing will be slower,
> > potentially quite slower !
> 
> "does not crash" is definitely more important than speed here.
> 
> What is that "should probably" doing in the third rule? I mean, that happens
> to be the source of the whole problem. So I'm wondering, if that's only a
> "should probably", would it be hard to separate the two?

  Sorry I would have to go though the whole code to make sure of the impact
no way I can do that from memory.

> >   I don't know when and how you took that decision, I cannot garantee you
> > will have a perfectly working solution in the general case that way.
> 
> Well, it wouldn't work if we used the same dict for all threads, as dictionary
> access is not serialised in libxml2.

  One more reason to use a subdict.

> So, what is the right thing to do? We already tell users not to share
> documents between threads (or at least not to modify them if they do). So the
> only remaining problem seems to be XSLT. From your rules above, I think we
> will end up disabling running stylesheets in other threads than the one they
> were parsed in, so no more xsltStylesheet sharing between threads. That's the
> only solution I see to comply with your first rule. The second rule is worked
> around by making the stylesheet dict also the transform dict, and so is the
> third rule.

  Direct immediate example of the problem if you break #1:
  Compilation of the stylesheet use the style dictionnary. As a result the
compiled version use pointers to that dict for the strings used in the matches
compilations. At transformation time the parsed document will use strings 
from the transformation dict (at least that's how xsltproc will do) so when 
doing lookups to know what template applies you need to do the string comparison
instead of doing pointer comparison when finding the right template to apply.
This of course is way slower. We have fallbacks when the string does
not come from the style dictionnary but you are more likely to hit bugs there
because that code is not used very much.

> >   No because you can have generated names see xsl:element for example
> 
> How is that a problem? If element names are generated they are either a) not
> stored in the dictionary (no problem) or b) stored in the dictionary and
> reused the next time the same name is generated.

  The generation can lead to any name. 

> As long as the number of
> languages is still limited, I can't see a reasonable way to make the
> dictionary explode.

  generate a name based on the number of the document being processed,
always reuse the transformation context, you're sur to explode. Probably
not the most convincing example but sufficient as a formal one.

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]