Re: [xml] Recording node info for HTML



Hey,

I'm using libxml2-2.9.8.

When using libxml to parse xml I can use

ctxt->record_info = true
xmlInitNodeInfoSeq(&ctxt->node_seq);
xmlParseDocument(ctxt)

to record positions for the parsed nodes.

However, for HTML the following

ctxt->record_info = 1;
xmlInitNodeInfoSeq(&ctxt->node_seq);
htmlParseDocument(ctxt);

leads to seg fault for some (not necessarily well formed) HTML files. A minimal example would be an HTML file with content "<label></label>" which leads to a seg fault:

#0  0x0000555555695199 in xmlSAX2EndElement (ctx=0x555555975a20, name=0x55555570141e "body") at external/libxml2/libxml2-2.9.8/SAX2.c:1815
#1  0x000055555561412b in htmlAutoCloseOnEnd (ctxt=0x555555975a20) at external/libxml2/libxml2-2.9.8/HTMLparser.c:1384
#2  0x000055555561cae2 in htmlParseContentInternal (ctxt=0x555555975a20) at external/libxml2/libxml2-2.9.8/HTMLparser.c:4674
#3  0x000055555561d0da in htmlParseDocument (ctxt=0x555555975a20) at external/libxml2/libxml2-2.9.8/HTMLparser.c:4817
#4  0x000055555556f81d in ParseHTML (content="<label></label>\n", nodes=0x7fffffffd7a0, error_message=0x7fffffffd8b0) at parser/xml_parser.cpp:431
#5  0x00005555555711e6 in main (argc=2, argv=0x7fffffffdb08) at parser/xml_parser.cpp:596

Does the API for parsing HTML files support recording positions of the nodes? If so, what am I doing wrong or what can be done to prevent the seg fault?

Thank you and best regards

Ben


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]