Re: [xml] Python bindings and Xpath and namespaces



Daniel Veillard <veillard redhat com> writes:

  you can get the namespace list of an element node directly
by walking the nsdef list, so it should be possible to rewrite
the equivalent function in Python.

Ok. I freely admit that I don't get namespaces generally and I get
them even  less in the context (geddit?) of libxml2's xpath. This is
an overlong email (sorry Daniel) that describes my problem
generally. Maybe it will be useful to someone struggling to understand
this stuff in the same way I am.


When I do this python on a RSS/RDF document:

  doc = libxml2.parseMemory(docstr, len(docstr))
  root = doc.getRootElement()
  nsls = root.nsDefs()

I get a single namespace node and I was expecting a list of 7
namesspaces.


Like another poster here I wanted to get a list of the namespace URIs
declared on a document so that I could register them and use xpath
expressions on them.

Eg: I have an RDF/RSS document which begins like so:

  <?xml version="1.0" encoding="ISO-8859-1"?>
  <rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
   xmlns="http://purl.org/rss/1.0/";
   xmlns:slash="http://purl.org/rss/1.0/modules/slash/";
   xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/";
   xmlns:dc="http://purl.org/dc/elements/1.1/";
   xmlns:syn="http://purl.org/rss/1.0/modules/syndication/";
   xmlns:admin="http://webns.net/mvcb/";
  >

Clearly, the libxml2 parser knows about these declarations. So I was
expecting it to make them available to me. But as I say: the value
returned by my code fragment above is actually just a single
namespace, the namespace of the root node.

I tried mucking about in xmllint's shell and this is what I found:

  $ xmllint --shell somerss.xml
  / > ls
  --n       27 RDF
  / > dir RDF
  / > cd RDF
  RDF is a 0 Node Set
  / > setns r=http://www.w3.org/1999/02/22-rdf-syntax-ns#
  / > dir r:RDF
  ELEMENT rdf:RDF
    namespace rdf href=http://www.w3.org/1999/02/22-rdf-syntax-...
    default namespace href=http://purl.org/rss/1.0/
    namespace slash href=http://purl.org/rss/1.0/modules/slash/
    namespace taxo href=http://purl.org/rss/1.0/modules/taxonomy...
    namespace dc href=http://purl.org/dc/elements/1.1/
    namespace syn href=http://purl.org/rss/1.0/modules/syndicat...
    namespace admin href=http://webns.net/mvcb/
  / > 

I think that means it's not just xpath but the rest of libxml2 that
needs namespaces to be declared before it makes sense (but maybe 'dir'
is using xpath under the hood).

But I'd be happy if I could replicate the above. The broad scheme is:

- get the root node
- get the namespace of the root node
- get the node again, but qualify it with the namespace

and this results in a node with the full ns declaration list.


So here's my new python code:

     doc = libxml2.parseMemory(docstr, len(docstr))
     root = doc.getRootElement()
     nsls = root.nsDefs()
[1]  print nsls

[2]  print nsls.get_content()

     xpctx = doc.xpathNewContext()
     xpctx.xpathRegisterNs("rdf", nsls.get_content())

     el_lst = xpctx.xpathEval("/rdf:RDF")

[3]  print len(el_lst)

     el = el_lst[0]
[4]  print el_lst.nsDefs().get_content()


And here's the output:

1.   <xmlNs (rdf) object at 0x401dfa6c>
2.   http://www.w3.org/1999/02/22-rdf-syntax-ns
3.   1
4.   http://www.w3.org/1999/02/22-rdf-syntax-ns

So... the value of nsDefs() on the qualified root node is still a
single object and not a list and it's still only the namespace of the
root noot.

So... I'm confused.

How can I dynamically obtain the namespace declarations of a document
from python? Or can't I?


Nic Ferrier



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]