[xml] Error on parsing HTML with libxml
- From: André Rothe <andre rothe zks uni-leipzig de>
- To: xml gnome org
- Subject: [xml] Error on parsing HTML with libxml
- Date: Fri, 17 Aug 2018 14:42:58 +0200
Hi,
I run into an HTML parser problem during PHP development. There is a
class DOMDocument, which uses libxml2 to parse HTML and XML documents. I
found out, that there is a problem with HTML documents, which have
inline Javascript code, which uses HTML tags within Javascript String
variables.
There is a little code example, which shows the problem:
https://3v4l.org/O0iEf
As you can see there, the last tag <td> is lost within the output.
Exactly the same error I will get with xmllint:
xmllint --html --htmlout /tmp/page.html
where page.html contains the HTML part of the example code above. The
output is
page.html:11: HTML parser error : Unexpected end tag : td
printwin.document.writeln('</td>');
and within the output, the String will be empty:
printwin.document.writeln('');
So I think, that the PHP error comes from the error within libxml2. I
use libxml2 version 2.9.1.
Is it possible to fix that or is it already fixed within a newer version?
Best regards
André
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]