Re: [xml] HTMLparser: whitespace in <body> tags



On Sun, 26 Sep 2004, Benj Carson wrote:

Right this makes sense to me.  Unfortunately, choosing a specific DTD or
reformatting my input isn't really a viable option for me in this case.
I'm actually writing an HTML to PDF converter using PHP (using PHP's
DomDocument extension, which in turn uses libxml).  I'd like the converter
to be as forgiving as possible in terms of its input, and to behave as much
like a normal web browser as it can.  For the most part, libxml really
shines here since it is quite tolerant to all the malformed HTML that
people like to write, and it saves me having to write a (slow, and likely
poor-quality) HTML parser in PHP.

Have you considered using Tidy?

http://www.php.net/tidy

-adam

-- 
adam trachtenberg com
author of o'reilly's "upgrading to php 5" and "php cookbook"
avoid the holiday rush, buy your copies today!



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]