Re: [xml] Error on parsing HTML with libxml

From: André Rothe <andre rothe zks uni-leipzig de>
To: xml gnome org, "Liam R. E. Quin" <liam fromoldbooks org>
Subject: Re: [xml] Error on parsing HTML with libxml
Date: Mon, 20 Aug 2018 09:48:25 +0200

I can't chage the source of the HTML page, because the page will be
generated by another system, where I don't have access. I get only the
pages from there and our Apache module makes a post-processing step just
before the pages will be sent to the user's browser. And there I need a
parser to change something within the page.

So I think, the libxml should not parse the content of inline scripts to
handle that.

There is also a comment on

https://stackoverflow.com/questions/51892455/php-5-4-16-domdocument-removes-parts-of-javascript

which describes your idea with CDATA, but it didn't work.

~André

On 18.08.2018 04:13, Liam R. E. Quin wrote:

On Fri, 2018-08-17 at 14:42 +0200, André Rothe wrote:


https://3v4l.org/O0iEf


Try changing
    ...writeln('</td>');
to
    ...writeln('<' + '/td>');
and see if that helps; or use a CDATA section,
<script><![CDATA[
  //..
]]></script> to escape the </td> markup from the HTML parser.
Although it may depend on what the missing //... lines look like,
assuming this is not the complete source.

Better yet, don't use document.write at all, and switch to more modern
practices :)

I'm not sure there's actually a bug here; if you feed the parser tag
soup, expect a mess. Keep zPHP, JavaScript, HTML, CSS in separate files
and life will probably be simpler.

References:
- [xml] Error on parsing HTML with libxml
  - From: André Rothe

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]