[xml] Get element tail



Hi

I am trying to follow lxml from Python that allows to get the text after the end of an element, but before the next element begins (i.e. the next sibling of the current element). I am able to do this with xmlTextReader, by obtaining a pointer from the current node (when the node type is ELEMENT) to its next sibling. However, this approach does not work all the times:

<h1>Text before <strong>bold 1 <underline>undelined text</underline> after bold 1</strong>in between <strong>bold 2</strong>text after <strong>bold 3</strong>.</h1> <h1><strong>bold 1</strong> no text before <strong>bold 2</strong> text after <strong>bold 3</strong>.</h1>

The first <h1> element is correctly parsed, but the second one is not, the text node " no text before " is not detected as the tail of the element <strong>. lxml however works correctly, this is the way actually I am validating my XML parser. I am a little bit puzzled by this result since lxml is an API for libxml2, however I am not sure if lxml implementation uses just xmlTextReader parser or buids the entire DOM tree. Is there a way to get the tail of an element with xmlTextReader ?

thanks
Bogdan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]