Re: [xml] Re: A new set of (tree) APIs for the XML parser.

On Thu, Sep 25, 2003 at 10:04:00AM -0700, Aleksey Sanin wrote:
It's easy to check the later, just run with and without
--nodict to see the impact with --timing, this is less efficient w.r.t.
the memory consumption than what I expected but it's still a serious speed

OK, I'll do timing myself. Did not have a chance yet :)

  interning of strings in the tree can gain around 10% 

  From 2.5.11 the xmlReader is twice as fast on various tests,
for pure SAX, there isn't that much gain because the parser does far
more stuff underneath, but this should boost speed consequently at 
an application level. For tree building it all depends on the document,
their size, their content and whether reuse of the parser context and
it's disctionary can be compared to the overall document parsing size.
I think most users will be happy.

 Hum, no, it's not an user option. Currently it's limited to 
text nodes of less than 3 bytes or blanks nodes (formatting spaces)
of less than 60 bytes to not explode the dictionnary.
 The dictionnary size is important, or rather keeping lookup fast
is important both for string lookup and checking if a string pertains to
the dictionary (Lookup and Owns operations, see dict.c if interested).

I wonder if you can make these limits available to the application 
somehow. I have a use case
when XML document has large (200-300 characters) repeating text nodes. I 
think that it would
be nice to cache these strings too.

  I'm afraid of DOS attack due to time spent in the dictionary lookup,
that's the reason I'm not comfortable with opening this up at this point.
serious testing and maybe a bit of formal proof could make me change my
mind ;-)


Daniel Veillard      | Red Hat Network
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]