[xml] Possible malformation "bug" in libxml2



A possible "bug" exists in the xml2.6.2 library.  Using Apache 2.0.48 and
mod_proxy_html, a call to the HTMLParse library will interpret the following
code and spit out an "unsightly" result.  Yes, the HTML example is bad.  But
if the xml2 library would simply drop the entire tag, then the resulting out
put would be "nicer" and probably "more correct".

Input HTML (yes, it's ugly but try it out in a browser and you'll see it
renders without a hitch -- imagine a poorly writen page-includer)

---------input.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html dir="TLD">
<body leftmargin="0">
<p>Some data
<p>
    <html dir="TLD">
    <body leftmargin="0">
    <p>Some included data, you'll see an unsightly after xml2 parses the above
    &quot;illegal&quot; tags.  Wouldn't it be better to simply drop the tag
    completely?
    </body></html>
</body></html>
---------output.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html dir="TLD"><body leftmargin="0"><p>Some data
</p><p>
     dir=&quot;TLD&quot;&gt;
    </p><body><p> leftmargin=&quot;0&quot;&gt;
    </p><p>Some included data, you'll see an unsightly after xml2 parses the above
    &quot;illegal&quot; tags.  Wouldn't it be better to simply drop the tag
    completely?
    </p></body></html>
---------done

Thanks in advance for any help.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]