Re: [xml] HTML Parser problems with chunk parser if HTML keywordsoverlap chunk border
- From: "Cyrill Osterwalder" <Cyrill Osterwalder visonys com>
- To: <veillard redhat com>, <xml gnome org>
- Cc: Cyrill Osterwalder <Cyrill Osterwalder visonys com>
- Subject: Re: [xml] HTML Parser problems with chunk parser if HTML keywordsoverlap chunk border
- Date: Thu, 22 Jun 2006 08:22:36 +0200
Why do you use PARSE_HTML_RECOVER ? The parser is already
doing recovery mode to some extend without them
(I mean the HTML parser :-).
Actually, the problem also seems to exist without PARSE_HTML_RECOVER,
otherwise the test with testHTML.c of the libxml2 package would not show
it, right? I will have to look at this again. I had the impression that
recovery mode is the trigger in htmlParseScript() to actually produce
the problem. But my testHTML.c example can be easily reproduced and it
does not use HTML_RECOVER. With the testHTML.c example it seems that
parsing fails if the CDATA end tag overlaps the chunk boundary. If
that's true even without PASRE_HTML_RECOVER, then it's just a matter of
luck if chunked parsing HTML with CDATA is successful.
with '</ style' or '</foo> and expect taht to close the open tag, and
'style "</" style' and expect to not close it...
I see. I guess there's a reason why the slash in "</" should be quoted
in CDATA contents if not being the real end tag ;-) However, there's a
lot of HTML out "in the wild" containing unquoted "</" strings in CDATA
blocks.
You can try, but it's all very messy IMHO, I will take
patches if not obviously broken
I will further look at it and get back to the list if I'm able to
produce anything useful.
Thanks,
Cyrill
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]