Re: [xml] HTMLparser comment parsing bug and patch

On Tue, Jul 29, 2003 at 10:47:37PM +0100, Nick Kew wrote:
No, it doesn't fix the problem.  Your patch now sets "incomment"
until it reaches the end of the comment being parsed - which means
it's gone past the sequence it's looking for.  So it has exactly
the same problem, made worse by the fact that it's doing more parsing.

Can you explain what the purpose of the "incomment" stuff is?
Under what circumstances does it want to to look past a comment
for a token?

Please do not suggest a patch if you don't understand the modified
code !
This code is used when doing progressive parsing. The progressive parser
need to get to the end of sequence. The modified function is there to
check that the current chunk contains a full sequence. The parameter
indicates the sequence of characters to look for to detect the
end and hence allowing to hand off the chunk to the parser. If the
sequence is embedded in a comment it must not be considered as being
present, it doesn't exist from a markup point of view.


looking for "</" current chunk being
  "<a> start <!-- </a> --> not finished "
must return false. If you remove the associated code as your patch
suggest it will return true which is wrong w.r.t. the function semantic.

  Your patch is wrong. It is possible that the incoming HTML is just too
broken but it's impossible to tell without getting it. Please provide it
as the bug report guideline asks.


Daniel Veillard      | Red Hat Network
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]