Re: [xml] "atoms" for name/attribute strings



On Sun, May 18, 2003 at 07:53:14PM -0400, Chris Ryland wrote:
Forgive me for throwing out wild ideas that might have large 
consequences for libxml2 (whose distilled wisdom I greatly respect from 
my brief acquaintance so far), but I had a wild idea that might greatly 
speed up tree processing after parsing.

If all name (and namespace) and attribute (xmlChar *) strings were 
"atomized" (like Lisp atoms or Python strings--hashed to unique 
strings) by the parser, then all input tree processing (after atomizing 
all "interesting" name/attribute strings) would be reduced to comparing 
string pointers, not string values.

  The main effect would be memory consumption and reduction of the number
of malloc()/free() needed for parsing if done correctly.
  The main drawback is that libxml2 trees are mutables so there is some
thinking/extra work required when processing to add nodes to a tree and
when freeing, it's not trivial but doable, I have been thinking about
this for a while.

This could be a parser configuration setting (like entity expansion) 
that would only affect clients that requested it, and wouldn't 
necessarily affect a whole lot of the code. (In fact, it might be 
completely localizable in the form of an allocation function called for 
such names.)

  Hum, there is far more than that needed.

Is this is a good idea, looked at from any of you old grizzled 
veterans' points of view?

  It's a good idea but the devil is in the details, in order to not
break the large amount of existing code.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]