Re: [xml] xmllint --html problem?



On Fri, Nov 09, 2001 at 02:54:50PM +0100, Elizabeth Mattijsen wrote:
My point was that I instructed xmllint to output UTF-8 encoding.  But when 
I check the resulting XML, it _doesn't_ have valid UTF-8 encoding.  Does 
that mean that the

   --encode UTF8

parameter of xmllint just sets the encoding attribute in the <?xml 
processor directive?

  hum, actually xmllint --html doesn't seems to save with the
HTML serializer. The testHTML test tool wich comes with the source
distribution behaves a bit better:

orchis:~/XML -> cat tst.html 
<html>
<body>
<p>très
</body>
</html>
orchis:~/XML -> ./xmllint --html  tst.html
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body><p>très
</p></body></html>
orchis:~/XML -> 

  This seems to encode to utf8 directly

orchis:~/XML -> ./testHTML   tst.html 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body><p>tr&egrave;s
</p></body></html>
orchis:~/XML -> 

  this will use the HTML entities

orchis:~/XML -> ./testHTML --encode ISO-8859-1   tst.html 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body><p>très
</p></body></html>
orchis:~/XML -> 

  This generate ISO-8859-1 output, but for some reason (I need to check)
doesn't generates the Meta tags :-\

  I know libxml can save HTML to any encoding because this was needed
for libxslt. But all the interfaces may not be availbale at the
xmllint command line level.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]