Re: [xml] HTMLparser: SGML comments



On Mon, Nov 14, 2005 at 06:53:44PM +1100, Michael Day wrote:

Hi Daniel,

I don't really know SGML, so such patches are welcome. I just have one
problem with the code, it calls GROW only when the end of the buffer is
detected with a NUL, I would rather have it called more preemtively to
in the loop to avoid a potential weakness in the case of multibyte chars.

I have changed the patch to call GROW in the loop each time before moving
on to the next character. (I don't know whether I should be calling SHRINK
as well, though?)

  Probably not needed ...

  Note also that I prefer patches than cut an paste of full routines, it
gives me the context of what was changed.

Here is a unified diff, is that the right format?

  yes except it was truncated (i.e. missing the header defining the file
being patched). However if I apply it the regression tests fails
on ./test/HTML/wired.html, it seems they have a comment like

   <!--TRADES->

which the HTML parser used to accept without complaining, but your code barfs
on it with:

 ./test/HTML/wired.html:517: HTML parser error : Comment not terminated
 <!--TRADES->
 <br>
 <font face= "Verdana, Arial, Geneva,

 ^

And as a result the start elements following seems not to be seen as such.
I think your code should be modified to allow any -> to close the comment
or maybe even just '>'. At least this should be looked at, I don't think
I can commit this without further analysis of that case.

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]