Re: [xslt] Replacing the dict used by the transform context



Salut Daniel,

Daniel Veillard wrote:
> On Mon, Aug 07, 2006 at 11:28:33AM +0200, Stefan Behnel wrote:
>>> In general replacing the dict with a random new one may simply break the
>>> processor, there is a number of places where string comparison have been
>>> replaced with pointer comparison because we knew that a string found in the
>>> parent dict would be the same as the string if queried in the subdict.
>>
>> That's good to know. So, what does that mean? Currently, libxslt generated
>> documents inherit the dictionary of the stylesheet. You say that it is
>> required that the transform dictionary can access the entries in the
>> stylesheet dictionary?
> 
> those are libxslt general rules. You may break them without trouble or not
> depending on the context of use. But they are strict guidelines when it come
> to libxslt own code:
>  - transformation dictionnary must be derived from the stylesheet dict
>  - stylesheet dict must be read only at transform time
>  - the generated document should probably reuse the transformation dict
> 
> I think in general libxslt tries to cope when this is not the case, but
> it's very hard to garantee non-pathological behaviour if those expectations
> are not met, for example it's nearly garanteed that processing will be slower,
> potentially quite slower !

"does not crash" is definitely more important than speed here.

What is that "should probably" doing in the third rule? I mean, that happens
to be the source of the whole problem. So I'm wondering, if that's only a
"should probably", would it be hard to separate the two?


>> We use per-thread dictionaries, so according to your above
>> quote, sharing an XSLT between threads will not work if we use the
>> thread-local dict for the transformation, right?
> 
>   I don't know when and how you took that decision, I cannot garantee you
> will have a perfectly working solution in the general case that way.

Well, it wouldn't work if we used the same dict for all threads, as dictionary
access is not serialised in libxml2.

So, what is the right thing to do? We already tell users not to share
documents between threads (or at least not to modify them if they do). So the
only remaining problem seems to be XSLT. From your rules above, I think we
will end up disabling running stylesheets in other threads than the one they
were parsed in, so no more xsltStylesheet sharing between threads. That's the
only solution I see to comply with your first rule. The second rule is worked
around by making the stylesheet dict also the transform dict, and so is the
third rule.


>>> However the big problem of your approach as I explained to Martijn when he
>>> started lxml is that as a result the dictionnary can never be freed and 
>>> cannot be shrinked either, so you have a ever balooning resource which will
>>> exhaust memory after a while no matter what.
>>
>> Sure, if the parsed XML is sufficiently diverse. However, as long as you stick
>> to a small set of XML languages (which is the most common use case, I'd say),
>> you should not run into too much trouble, right?
> 
>   No because you can have generated names see xsl:element for example

How is that a problem? If element names are generated they are either a) not
stored in the dictionary (no problem) or b) stored in the dictionary and
reused the next time the same name is generated. As long as the number of
languages is still limited, I can't see a reasonable way to make the
dictionary explode.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]