Re: [xml] Namespace Handling



On Tue, Sep 09, 2003 at 11:14:29PM +0100, Balint Joo wrote:

Dear All,
      I have recently been writing some C++ wrapper classes around
libxml2. The idea is to create what a friend of mine called a 'cherry
picking reader' API. Basically we have XMLReader objects on
which we can perform XPath Queries using libxml2's XPath processor.
This is so that we can read XML based parameter files without
explicit data binding -- which doesn't seem to exist for C++ at
least not for free.

  okay

In order to make all of this fairly general, I have to deal with
namespaces that may be declared in the document. In particular,
I need to register the namespaces in the document in the XPath
processor's context. Currently I do this with a full tree traversal of the
document (at the start) -- with a simple recursive function like below


   [...]

There are two things wrong with this:

   i) It may well be plain wrong. It registers every single (local and global)
      namespace as if it were global. Although it seems to work well at the   
      moment

  I find hard to accept it as right. You can define whatever semantic
you want for the namespace binding since there is none coming from the
spec but prefix "a" may be associated to 10 different namespaces in the
document, picking the fist one in document order doesn't sound right at
all...
  There is also no notion of "local" vs. "global" namespace, the 1.0 
namespace defines it as a subtree. The 1.1 spec also allows to unmap
a prefix in a subtree.

   ii) This involves a full traversal of the whole XML tree which in our case 
may be quite large (one of our current test files is 14M) and eventually we'll 
need to process longer files than this. This seems to me a great waste of time. 

  Yes this sounds like a terrible waste.

  A profile of the application code suggests that just one
full traversal, snarfing namespaces as we go, took some 39% of our
applications run time, and this was before we executed our first XPath query.
It is especially wasteful in the situation where no namespaces are declared
in the XML document.

  yes, really a bad idea IMHO.

      Your helpful suggestions would be greatly appreciated.

  Simply walk the current node and its ancestors, the cost will be linear
to the depth of the tree, not it size !
  There is already a call doing this exported from the tree module

/**
 * xmlGetNsList:
 * @doc:  the document
 * @node:  the current node
 *
 * Search all the namespace applying to a given element.
 * Returns an NULL terminated array of all the #xmlNsPtr found
 *         that need to be freed by the caller or NULL if no
 *         namespace if defined
 */
xmlNsPtr *
xmlGetNsList(xmlDocPtr doc ATTRIBUTE_UNUSED, xmlNodePtr node);

Daniel


-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]