[xml] xmllint output is valid html



Hi there,

I have a large HTML file, that was kinda ugly so i ran it through xmllint and saved the output:

# xmllint --html --format --htmlout sales_db.php.html > salesdb-xmllintout.html

here is the head of the result, i am sure there is nothing special about it:

h24-79-244-68#  head -4 salesdb-xmllintout.html
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd";>
<html>
 <head>


.........

i want to use xsltproc and an xslt to make a usefull xml file out of the above file, when i run xsltproc with a very minimal xsl file i get the error:


h24-79-244-68# xsltproc saleshtml2xml.xsl salesdb-xmllintout.html
http://www.w3.org/TR/REC-html40/loose.dtd:31: error: xmlParseEntityDecl: entity HTML.Version not terminated
 -- Typical usage:

......

This has to be simple but i dont understand how the DOCTYPE that was added by xmllint could be wrong according to xsltproc - i am obviously missing some very fundamental concept here regarding the output of xmllint perhaps ( i also tried the -html option )

I also get the same msgs if i run the output of xmllint back into xmllint with the -valid option

-when i turn off validation ( -novalid ) there is no problem - i guess i dont want to revalidate the HTML file against the public dtd - is that right?

thank you in advance

darrell
ddupasedm netscape net



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]