Re: [xml] Get element tail


Please find my answers below:

On Wed, Oct 23, 2013 at 12:12 PM, Bogdan Cristea wrote:


I am trying to follow lxml from Python that allows to get the text after the
end of an element, but before the next element begins (i.e. the next sibling
of the current element). I am able to do this with xmlTextReader, by
obtaining a pointer from the current node (when the node type is ELEMENT) to
its next sibling.

How do you do that? What call do you use?

xmlNodePtr nextSibling = node->next;

Then I check the node type and if it is a text node I get its content.
<h1>Text before <strong>bold 1 <underline>undelined text</underline> after
bold 1</strong>in between <strong>bold 2</strong>text after <strong>bold
<h1><strong>bold 1</strong> no text before <strong>bold 2</strong> text
after <strong>bold 3</strong>.</h1>

The first <h1> element is correctly parsed, but the second one is not, the
text node " no text before " is not detected as the tail of the element

There is no such thing as "tail of the element <strong>" from the POV
of xmlTextReader.
The text node is merely the next node the cursor would be pointing at
calling xmlTextReaderRead.

Not sure how xmlTextReader is implemented, but my tests showed that I can get the element tail and compare 
with the value obtained from Python lxml. Actually, in the above example after correcting the XML file to be 
a valid DTBook I can parse correctly the entire file and get an element tail.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]