Re: [xml] [PATCH] less-than character and HTML parser module
- From: Christian Schoenebeck <schoenebeck crudebyte com>
- To: veillard redhat com
- Cc: xml gnome org
- Subject: Re: [xml] [PATCH] less-than character and HTML parser module
- Date: Thu, 16 Apr 2015 13:59:28 +0200
On Thursday 16 April 2015 10:32:32 you wrote:
There you go; you find the updated patch attached. It now requires
HTML_PARSE_RECOVER option to be set for recovering from stand-alone
less-than characters.
That sounds fine *except* it doesn't raise an error.
The parser knows it's a broken construct that must be pointed out.
Ok, I see what I can do about that. ;)
It sounds a bit weird to handle that error case as one of the main content
cases, I would still be tempted to go into htmlParseStartTag, get the
error reported, but push corrective data instead in recover mode.
My initial thought solution was to enter htmlParseElement() like before, and
in case htmlParseElement() encounters an error, it would handle the chunk as
text instead (if recover option is on). That would probably come to the
closest what most browsers seem to do. But the problem: that would require the
public API function's prototype of
void htmlParseElement(htmlParserCtxtPtr)
to be changed to
int htmlParseElement(htmlParserCtxtPtr)
To avoid that API change, one could add another internal (static) version of
htmlParseElement() providing a return value, however there is already one
htmlParseElementInternal(), so adding yet another one would become nasty IMO.
Best regards,
Christian Schoenebeck
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]