Re: [xml] Error on parsing HTML with libxml

I have looked into the libxml code and I found the method
htmlParseScript() within HTMLParser.c.

It describes the problem with the "<" character within scripts.
But it offers the possibility to use the recover mode to ignore
the tags.

I have used

xmllint --html -htmlout --recover mypage.html

and it returns the last </td> tag. The PHP equivalent does not work
(there is a flag "recover" on class DOMDocument, but the output is
always the same). So I will look into the DOMDocument code (if it is


On 18.08.2018 00:33, Eric S Eberhard wrote:
I could be way off base -- don't you have to encode the portions in the js?  Otherwise I can see it being 
confused.  The js looks like data and it can't have < or > in it.


Eric S Eberhard
VICS (Vertical Integrated Computer Systems)
Voice: 928 567 3529
Cell    : 928 301 7537  (not reliable except for text or if not home)
2933 W Middle Verde Rd
Camp Verde, AZ  86322

-----Original Message-----
From: xml [mailto:xml-bounces gnome org] On Behalf Of André Rothe
Sent: Friday, August 17, 2018 5:43 AM
To: xml gnome org
Subject: [xml] Error on parsing HTML with libxml


I run into an HTML parser problem during PHP development. There is a class DOMDocument, which uses libxml2 
to parse HTML and XML documents. I found out, that there is a problem with HTML documents, which have 
inline Javascript code, which uses HTML tags within Javascript String variables.

There is a little code example, which shows the problem:

As you can see there, the last tag <td> is lost within the output.
Exactly the same error I will get with xmllint:

xmllint --html --htmlout /tmp/page.html

where page.html contains the HTML part of the example code above. The output is

page.html:11: HTML parser error : Unexpected end tag : td

and within the output, the String will be empty:


So I think, that the PHP error comes from the error within libxml2. I use libxml2 version 2.9.1.

Is it possible to fix that or is it already fixed within a newer version?

Best regards

xml mailing list, project page xml gnome org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]