Re: [xml] persisting parsed DOMs

From: Daniel Veillard <veillard redhat com>
To: tom dyson <tom torchbox com>
Cc: xml gnome org
Subject: Re: [xml] persisting parsed DOMs
Date: Tue, 25 Mar 2003 12:36:55 -0500

  Hi,

 looking at the title my immediate reaction was "uh oh, libxml3 discussion
is coming back ..."

On Tue, Mar 25, 2003 at 03:45:56PM +0000, tom dyson wrote:

I've just joined the list, so apologies if this has been covered before (I
can't find any references in the archives).


  Hum, might be a good idea to read the couple of year old threads
pointed by http://xmlsoft.org/search.php?query=libxml3

I've been working - with John Gray - on enabling XPath queries within the
PostgreSQL database:

http://www.throwingbeans.org/tech/postgresql_and_xml.html


  Looks cool but a bit scary w.r.t. performances :-)

These functions - which are used in several high traffic websites - are
wrappers around libxml2. Performance is good, but will necessarily decrease
as the number of database records increases; to evaluate an XPath expression
against the contents of a column in 500 records requires the creation by
libxml of 500 DOMs from the XML fragments.


  Yeah it's fundamentally inefficient to just store the serialized form
and reparse each time...

Since PostgreSQL supports
user-defined column types -

http://www.postgresql.org/docs/view.php?version=7.3&file=sql-createtype.html

- it may be beneficial to create a native 'XML' type column for storing
parsed DOM representations of XML strings. Which leads me to my question:


  Hum, libxml2 has no preparsed data serialization. I'm not too fond
of "binary XML" or similar idea, because that need to be parsed too
and the gain isn't that much compared to pure XML parsing. But I may
find intermediate ways to speed up parsing if done repetedly.

Does libxml2 support the persistence of parsed DOM representations? If so,


  No, that had been looked at in libxml3 discussions but it's a major 
change since it would break API and ABI.

would this offer a significant performance advantage? If not, can you
suggest any other routes to performance improvement (reusing compiled XPath
expressions, perhaps)?


   Keeping precompiled XPath expressions should be doable, but parsing
an XPath expression is relatively quick too.
   My viewpoint is that providing an XML string parsing function allowing
to record informations to speed-up future parsing from that same string
might be the right way for your specific case, but I didn't gave it much
thinking yet. So far most of the libxml3 discussion were more about 
having a huge DOM and keeping only part of it in memory, which is rather
different from what you need, though preparsing metadata may help a lot
in both case.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

References:
- [xml] persisting parsed DOMs
  - From: tom dyson

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]