Re: [xml] manipulating tree causes Seg fault

From: Nick Lang <nick lang propylon com>
To: xml gnome org
Subject: Re: [xml] manipulating tree causes Seg fault
Date: Wed, 12 Aug 2009 09:41:22 -0500

Michael, lets try this then:
Preface:

I have an in house developed XML processing library, with utilitiesdesigned to manipulate XML documents for transformations.This library is built on top of libxml2 when using Python and usingXalan/Xerces when using Jython.When writing unit tests to test the various transformation components,this one specific test seg faults in Python, and executes just fineusing Jython.


The process in which this transformation occurs is as follows:

Given an XML document (specifically the content.xml from an ODT), aregular expression, and a replacement string.Find a specific text node. Replace with an element node (containing atext node).

Due to the nature of this document, there exists two abc strings, one is"abc" and one is zabcz". The "abc" is to be left alone, and the "zabcz"is to be changed to have a bold "b". To find this specific case, I'musing the regular expression (.a)(b)(c.)

After finding the specified text node, "zabcz" (lets call this textnode: to_be_replaced), I want to replace it with "za<text:spantext:style-name='bold'>b</text:span>cz"

Essentially, making the b in the middle of zabcz bold.

The process by which I'm adding this new element, is by creating a newxml document with a temp root node.<temp xmlns:text="http://openoffice.org/2000/text";>za<text:spantext:style-name='bold'>b</text:span>cz</temp>

I then add this new document to the original document as a child, of theparent of "to_be_replaced". (lets call this parent, foster_parent)

Next I remove (unlink) the node "to_be_replaced"
Next I collect all the children of the node temp.
I unlink temp, and then add all the children of temp back to foster_parent.

After that is all said and done, what should be left as the children offoster_parent is: za<text:span text:style-name='bold'>b</text:span>cz


I hope this is a little more clear.

Thanks again

Nick

Michael Ludwig wrote:

Nick Lang schrieb:

I have an XML document.
my regex search = "(.a)(b)(c.)"
to be replaced with: [...]

everything is honky-doory.

So what i've done is created a function to collect all the children of
<a>, unlink them, and then link them to the parent of <a>.

What happens after I do this though, is quite a disaster. In the
processing framework I use, it causes python to seg fault (every time)

After I did that, I find python blowing up, and spamming the screen
with memory allocation errors.

I'm adding in this new tree via a regex route


I'm sorry, but I haven't managed to figure out what you're doing. I
might be missing something, but the details seem a bit scary. Regex
search? On the tree? Adding a tree via a "regex route"? Honky-doory?

I suggest you arrange your thoughts in less nebulous language.

One thing: Are you aware you need to import/adopt nodes when
transplanting them from one document to another?

Follow-Ups:
- Re: [xml] manipulating tree causes Seg fault
  - From: Michael Ludwig

References:
- [xml] manipulating tree causes Seg fault
  - From: Nick Lang
- Re: [xml] manipulating tree causes Seg fault
  - From: Michael Ludwig

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]