Re: [xml] xmllint --html problem?
- From: Elizabeth Mattijsen <liz dijkmat nl>
- To: veillard redhat com
- Cc: xml gnome org
- Subject: Re: [xml] xmllint --html problem?
- Date: Fri, 09 Nov 2001 14:54:50 +0100
Hi Daniel,
At 08:46 AM 11/9/01 -0500, Daniel Veillard wrote:
> # xmllint --html --encode UTF8 71.html >71.xml 2>/dev/null
This parse an HTML resource and save an HTML resource.
I assume there was errors (2>/dev/null) so I don't have much context.
Plenty of errors, but I don't really care about them. Just trying to get
as much out of the HTML with as little manual work possible.
> # xmllint --noout 71.xml
> 71.xml:53: error: Input is not proper UTF-8, indicate encoding !
> ophy of Education, The</a><br/>Edited by Michael A. Peters (New
Zealand)Ã? &
My point was that I instructed xmllint to output UTF-8 encoding. But when
I check the resulting XML, it _doesn't_ have valid UTF-8 encoding. Does
that mean that the
--encode UTF8
parameter of xmllint just sets the encoding attribute in the <?xml
processor directive?
xmllint does not magically convert HTML to XHTML. Use Tidy for this
(see the W3C page for pointers).
Will have a look there.
The actual exact URL is http://www.w3.org/People/Raggett/tidy/ . Thanks
for the pointer!
Elizabeth Mattijsen
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]