Re: [xml] manipulating tree causes Seg fault



Michael, lets try this then:
Preface:
I have an in house developed XML processing library, with utilities designed to manipulate XML documents for transformations. This library is built on top of libxml2 when using Python and using Xalan/Xerces when using Jython. When writing unit tests to test the various transformation components, this one specific test seg faults in Python, and executes just fine using Jython.

The process in which this transformation occurs is as follows:
Given an XML document (specifically the content.xml from an ODT), a regular expression, and a replacement string. Find a specific text node. Replace with an element node (containing a text node).

Due to the nature of this document, there exists two abc strings, one is "abc" and one is zabcz". The "abc" is to be left alone, and the "zabcz" is to be changed to have a bold "b". To find this specific case, I'm using the regular expression (.a)(b)(c.)

After finding the specified text node, "zabcz" (lets call this text node: to_be_replaced), I want to replace it with "za<text:span text:style-name='bold'>b</text:span>cz"
Essentially, making the b in the middle of zabcz bold.

The process by which I'm adding this new element, is by creating a new xml document with a temp root node. <temp xmlns:text="http://openoffice.org/2000/text";>za<text:span text:style-name='bold'>b</text:span>cz</temp>

I then add this new document to the original document as a child, of the parent of "to_be_replaced". (lets call this parent, foster_parent)
Next I remove (unlink) the node "to_be_replaced"
Next I collect all the children of the node temp.
I unlink temp, and then add all the children of temp back to foster_parent.

After that is all said and done, what should be left as the children of foster_parent is: za<text:span text:style-name='bold'>b</text:span>cz

I hope this is a little more clear.

Thanks again

Nick

Michael Ludwig wrote:
Nick Lang schrieb:

I have an XML document.
my regex search = "(.a)(b)(c.)"
to be replaced with: [...]

everything is honky-doory.

So what i've done is created a function to collect all the children of
<a>, unlink them, and then link them to the parent of <a>.

What happens after I do this though, is quite a disaster. In the
processing framework I use, it causes python to seg fault (every time)

After I did that, I find python blowing up, and spamming the screen
with memory allocation errors.

I'm adding in this new tree via a regex route

I'm sorry, but I haven't managed to figure out what you're doing. I
might be missing something, but the details seem a bit scary. Regex
search? On the tree? Adding a tree via a "regex route"? Honky-doory?

I suggest you arrange your thoughts in less nebulous language.

One thing: Are you aware you need to import/adopt nodes when
transplanting them from one document to another?




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]