Re: [xml] [PATCH] less-than character and HTML parser module



On Thursday 16 April 2015 10:32:32 you wrote:
There you go; you find the updated patch attached. It now requires
HTML_PARSE_RECOVER option to be set for recovering from stand-alone
less-than characters.

That sounds fine *except* it doesn't raise an error.
The parser knows it's a broken construct that must be pointed out.

Ok, I see what I can do about that. ;)

 It sounds a bit weird to handle that error case as one of the main content
cases, I would still be tempted to go into htmlParseStartTag, get the
error reported, but push corrective data instead in recover mode.

My initial thought solution was to enter htmlParseElement() like before, and 
in case htmlParseElement() encounters an error, it would handle the chunk as 
text instead (if recover option is on). That would probably come to the 
closest what most browsers seem to do. But the problem: that would require the 
public API function's prototype of

        void htmlParseElement(htmlParserCtxtPtr)

to be changed to

        int htmlParseElement(htmlParserCtxtPtr)

To avoid that API change, one could add another internal (static) version of 
htmlParseElement() providing a return value, however there is already one 
htmlParseElementInternal(), so adding yet another one would become nasty IMO.

Best regards,
Christian Schoenebeck


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]