Re: [xml] Incorrect processing of embedded script|style tags in RECOVERY mode.



On Thu, Aug 02, 2007 at 07:02:26AM +0400, Andrey A. Chujko wrote:
Hello All,

In recovery mode, parent 'script' or 'style' section will be parsed 
wrongly if it  contains the same embedded one.
Say, an HTML document contains following script section:
================================Cut here===================================
<script language=javascript>
...
document.write('<script language=vbscript\>blah</script\>');
...
</script>
================================Cut here===================================
It's content escaped incorrectly.


After this document processed with HTML SAX Parser in RECOVERY mode, the 
original section looks corrupted:
================================Cut here===================================
<script language=javascript>
...
document.write('<script language=vbscript\>blah</script>
================================Cut here===================================

Cause both, the parent tag and the embedded one have similar names, the 
Parser breaks
parent section parsing prematurely, once it met the end of the embedded 
section.
(see HTMLparser.c, htmlParseScript function, line 2689).

  Well I'm sure that HTML breaks in a number of places, not just in libxml2
looks to me a case of broken beyond recovery data.

Possible patch is attached.

  Could you try to explain your patch in english, i.e. what kind of workaround
you suggest, this may help discuss it,

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]