Re: [xml] serialize nodes returned by successive XPath evaluation; preserving namespaces

From: Daniel Veillard <veillard redhat com>
To: Matt Magoffin <gnome org msqr us>
Cc: xml gnome org
Subject: Re: [xml] serialize nodes returned by successive XPath evaluation; preserving namespaces
Date: Thu, 2 Apr 2009 17:23:51 +0200

On Mon, Mar 30, 2009 at 01:56:22PM +1300, Matt Magoffin wrote:

I'm trying to find the correct way to serialize the nodes returned in a
node list via an XPath evaluation and preserving the namespaces of the
source document. My problem originates in the XML support in PostgresSQL
(http://archives.postgresql.org//pgsql-bugs/2008-06/msg00124.php) which
shows a small test case... but in effect if I have a document like

<a:foo xmlns:a="a:urn">
  <a:bar x="y">bar1</a:bar>
  <a:bar x="y">bar2</a:bar>
</a:foo>

and I evaluate the XPath /a:foo/a:bar[1] (with the "a:urn" namespace
mapping registered) to get a single node

<a:bar x="y">bar1</a:bar>

I want to then be able to evaluate another XPath on that node like
/a:bar/@x and get a matching attribute @x.

This second XPath evaluation is what is not working... but it _does_ work
if no namespaces are present in the source document.

In the context of how PostgreSQL is using libxml, after the first XPath
evaluation it is serializing the results by calling xmlNodeDump() on each
node returned in the node list returned by the XPath evaluation. And
xmlNodeDump() is returning the string literal

<a:bar x="y">bar1</a:bar>

which does not have the "a:urn" namespace declaration as one might expect
(at least, for a document), e.g.

<a:bar xmlns:a="a:urn" x="y">bar1</a:bar>

Is there a way for xmlNodeDump(), or some other function, to serialize a
node such as this one in this latter way rather than the former?


  Hum, no. Still I don't really understand the need to serialize , but I
assume it's not an option to reevaluate the XPath (as a relative one
i.e. ./a:bar/@x ) on the node(s) selected from the first query.

  That could possibly be added to libxml2 but won't be available by
default, until people update.
  It's very weird that the implementation has been made this way, XPath
was designed to be namespace aware, so whoever plugged XPath in pgsql
completely missed the namespace issue, a simple node dump is not
preserving namespaces, and if you add them and reserialize you may
change the semantic from XPath on the original document.
  So I really wonder how hard the design based on serialization of the
intermediate result really is, maybe that should be revisited, maybe
that's impossible, but in that case you will have to play tricks
like use xmlGetNsList() on the node (or rather its parent), make a copy
at the node level (verifying they don't clash with existing namespace on
the node), and then do the xmlNodeDump(). A bit messy...

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]