[xml] registerXPathFunction callback parameters, what to return



DV said I should email the list, so here goes.

I'm trying to create and register a xpath function re_contains that
works the same way as the normal contains function except that it
accepts a regular expression as its second argument.

I have two problems, one with function arguments, another with return
values.

Here's the code:

#!/usr/bin/python

import libxml2
import sys
import re

def re_contains(context, s, p):
        print "s:", s, ",", len(s), ", p:", p
        for ss in s:
                print "ss: ", ss
                print dir(ss)
        if re.search(p, s):
                return 1
        return 0

def find_matches(pattern, files):
        matches = []
        for f in files:
                doc = libxml2.parseFile(f)
                ctxt = doc.xpathNewContext()
                libxml2.registerXPathFunction(ctxt._o, "re_contains", None,
re_contains)
                res = ctxt.xpathEval(pattern)
                if res:
                        matches.append((f, res))
        return matches

if __name__ == '__main__':
        pattern = sys.argv[1]
        files = sys.argv[2:]
        matches = find_matches(pattern, files)
        for file, nodes in matches:
                print "---", file
                for node in nodes:
                        print node.serialize()
                        print "--"

The script works a bit like grep: it accepts as its first argument an
xpath expression, and after that a list of files. It prints out the
matching parts of the files.

When I try to invoke it with an xpath expression like //foo/bar
[re_contains(.,'as?df')], to search the contents of element bar, the
value assigned to s in re_contents is a PyCObject that looks like a list
with one item. The item is another PyCObject; taking dir() of it returns
an empty list.

$cat test.xml

<foo><bar>baz</bar></foo>

$./xpathgrep.py "//bar[re_contains(.,'ba')]" test.xml
s: [<PyCObject object at 0x401a74d0>] , 1 , p: ba
ss:  <PyCObject object at 0x401a74d0>
[]
/usr/lib/python2.3/site-packages/libxml2.py:511: RuntimeWarning:
tp_compare didn't return -1 or -2 for exception
  if type(o) == type([]) or type(o) == type(()):
Traceback (most recent call last):
 [... snip an exception from re]

Using the xpath function name() instead of . works out better:

$./xpathgrep.py "//foo[re_contains(name(),'ba')]" test.xml
s: foo , 3 , p: ba
ss:  f
 [... snip iterating f, o and o ]

So should I do something magic when the user has passed in .? Or is this
a bug?

Using name() shows the second problem: what to return? True and False
aren't the answer, apparently, because it says Unable to convert Python
Object to XPath. The same with 1 and 0. I see contains calls a function
called valuePush to store the value, but I don't think that's available
in Python. Apparently the Python bindings call a function called
libxml_xmlXPathObjectPtrConvert to convert the return value to something
that can be used as an argument to valuePush, but I can't see anything
that would indicate it could deal with boolean values.

This is libxml2 2.6.11.

-- 
[ Juri Pakaste | juri iki fi | http://www.iki.fi/juri/ ]

Attachment: signature.asc
Description: This is a digitally signed message part



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]