[xml] Error on parsing HTML with libxml



Hi,

I run into an HTML parser problem during PHP development. There is a
class DOMDocument, which uses libxml2 to parse HTML and XML documents. I
found out, that there is a problem with HTML documents, which have
inline Javascript code, which uses HTML tags within Javascript String
variables.

There is a little code example, which shows the problem:

https://3v4l.org/O0iEf

As you can see there, the last tag <td> is lost within the output.
Exactly the same error I will get with xmllint:

xmllint --html --htmlout /tmp/page.html

where page.html contains the HTML part of the example code above. The
output is

page.html:11: HTML parser error : Unexpected end tag : td
        printwin.document.writeln('</td>');

and within the output, the String will be empty:

printwin.document.writeln('');

So I think, that the PHP error comes from the error within libxml2. I
use libxml2 version 2.9.1.

Is it possible to fix that or is it already fixed within a newer version?

Best regards
André



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]