Re: [xml] Incorrect processing of embedded script|style tags in RECOVERY mode.

On Thu, Aug 02, 2007 at 04:05:42PM +0400, Andrey C. (aka mohmad) wrote:

Daniel Veillard wrote:
In recovery mode, parent 'script' or 'style' section will be parsed 
wrongly if it  contains the same embedded one.
Say, an HTML document contains following script section:
<script language=javascript>
document.write('<script language=vbscript\>blah</script\>');
It's content escaped incorrectly.

After this document processed with HTML SAX Parser in RECOVERY mode, the 
original section looks corrupted:
<script language=javascript>
document.write('<script language=vbscript\>blah</script>

Cause both, the parent tag and the embedded one have similar names, the 
Parser breaks
parent section parsing prematurely, once it met the end of the embedded 
(see HTMLparser.c, htmlParseScript function, line 2689).

 Well I'm sure that HTML breaks in a number of places, not just in 
looks to me a case of broken beyond recovery data.

Possible patch is attached.

 Could you try to explain your patch in english, i.e. what kind of 
you suggest, this may help discuss it,

In RECOVER mode, during script|style tags processing, the patch counts 
number of embedded tags which are have name similar to the parent's one.
Processing of script|style tag breaks only if the counter isn't greater 
than zero, otherwise it's assumed that the end of embedded script|style 
tag has been reached and it's being treated as CDATA.

Pseudo code:
  mtags = 0;
  tagname = {script|style};

  if ((cur == '<'))
     if ((NXT(1) == '/'))
        if (recovery && curtagname == tagname)
           if (mtags-- <= 0)
              break; // the end of tag being processed
     } else if (recovery && curtagname == tagname)
        ++mtags; // the same embedded tag

  // treat parsed content as CDATA

  Seems it would trivially break if


is embedded in the content, sorry, it looks like it will break more document
than it might fix, which is the hard dilemna for any attempt to fix broken


Red Hat Virtualization group
Daniel Veillard      | virtualization library
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]