Re: [xml] Get text content of an element which surrounds another element

On Mon, Oct 21, 2013 at 11:50 AM, Bogdan Cristea <cristeab gmail com> wrote:
On 10/20/2013 06:53 PM, Nikolay Sivov wrote:
On 10/20/2013 19:13, Bogdan Cristea wrote:
First, I am puzzled by the way I can obtain the text contained by h1 element. I am using the node from a previous xmlTextReaderRead() call. If I try to obtain the current node in XML_READER_TYPE_TEXT case the node pointer is NULL.
Well, that's not surprising. 'h1' in your example contains 3 children - text, element, text - in that order. So when you request node->children you get first child which is a text node, its content is in 'content' field. I don't see a problem here.

Second, I don't know how to obtain the text after <img> element which still belongs to <h1> element. s there a way to do so ?
It looks wrong to use a node pointer returned from previous reader iteration. As I understand it could be reused, and previous node is freed when you're done with it. So in your example you store previous element pointer which is 'img' and later try to use it to get some text (which can't be outside of 'img' scope by the way)? Are you sure xmlTextReaderCurrentNode() doesn't return anything for text node?


Indeed, I get <h1> element text when I look for the content its children (<img> is between the text); However, I still need to wait for node type XML_READER_TYPE_TEXT in order to obtain the text. Currently first I detect a XML_READER_TYPE_ELEMENT, which is <h1>, after this I have XML_READER_TYPE_TEXT when I get the text before and after <img> element, then I get another XML_READER_TYPE_ELEMENT which is <img> and its attributes. Inside XML_READER_TYPE_TEXT a call to xmlTextReaderCurrentNode() gives me a NULL pointer, so I find a little strange the way libxml2 API works, althow not completely forbidden.

Hah, so you're saying that first XML_READER_TYPE_TEXT gives you not-null pointer with xmlTextReaderCurrentNode()? And this returned node contains concatenated text from first and last children of 'h1'? That strange.

