Martin (gzlist) wrote:
The bowels of the "html sanitizer" (I'm sanitizing html email and marking up links to open in new windows, etc) could not cope with that knowledge!On 09/10/2009, Sam Liddicott <sam liddicott com> wrote:So although libxml2's html parsing of "tag soup" is a boon, unless I want to present the output as text/xml (which I do not) I'm in trouble.There are well documented ways of varying content-type so you can tell sane browsers what you're sending is actually xml, while maintaining compatability with less capable user agents. Thats the great sort of stuff I needed - thanks very much.Also, using one of the XHTML 1.0 doctypes will make the serialiser use the appendix C compatibility hacks. The output doesn't even need to be valid. Sam Stylesheet (applied to any input): <xsl:stylesheet xmlns="http://www.w3.org/1999/xhtml" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" > <xsl:output omit-xml-declaration="yes" doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/> <xsl:template match="/"> <p><tt/><br/></p> </xsl:template> </xsl:stylesheet> Outputs (amusingly bogus): <!DOCTYPE p PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <p xmlns="http://www.w3.org/1999/xhtml"><tt></tt><br /></p> Using the same <xsl:output/> in your transform should give you a result that html parsers old and new don't get too confused over. Martin _______________________________________________ xslt mailing list, project page http://xmlsoft.org/XSLT/ xslt gnome org http://mail.gnome.org/mailman/listinfo/xslt |