Re: [xml] xmllint --html problem?
- From: Daniel Veillard <veillard redhat com>
- To: Elizabeth Mattijsen <liz dijkmat nl>
- Cc: xml gnome org
- Subject: Re: [xml] xmllint --html problem?
- Date: Fri, 9 Nov 2001 09:13:03 -0500
On Fri, Nov 09, 2001 at 02:54:50PM +0100, Elizabeth Mattijsen wrote:
My point was that I instructed xmllint to output UTF-8 encoding. But when
I check the resulting XML, it _doesn't_ have valid UTF-8 encoding. Does
that mean that the
--encode UTF8
parameter of xmllint just sets the encoding attribute in the <?xml
processor directive?
hum, actually xmllint --html doesn't seems to save with the
HTML serializer. The testHTML test tool wich comes with the source
distribution behaves a bit better:
orchis:~/XML -> cat tst.html
<html>
<body>
<p>très
</body>
</html>
orchis:~/XML -> ./xmllint --html tst.html
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>très
</p></body></html>
orchis:~/XML ->
This seems to encode to utf8 directly
orchis:~/XML -> ./testHTML tst.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>très
</p></body></html>
orchis:~/XML ->
this will use the HTML entities
orchis:~/XML -> ./testHTML --encode ISO-8859-1 tst.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>très
</p></body></html>
orchis:~/XML ->
This generate ISO-8859-1 output, but for some reason (I need to check)
doesn't generates the Meta tags :-\
I know libxml can save HTML to any encoding because this was needed
for libxslt. But all the interfaces may not be availbale at the
xmllint command line level.
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]