Re: [xml] xmllint --html problem?



Hi Daniel,

At 08:46 AM 11/9/01 -0500, Daniel Veillard wrote:
> # xmllint --html --encode UTF8 71.html >71.xml 2>/dev/null
 This parse an HTML resource and save an HTML resource.
I assume there was errors (2>/dev/null) so I don't have much context.
Plenty of errors, but I don't really care about them.  Just trying to get 
as much out of the HTML with as little manual work possible.

> # xmllint --noout  71.xml
> 71.xml:53: error: Input is not proper UTF-8, indicate encoding !
> ophy of Education, The</a><br/>Edited by Michael A. Peters (New Zealand)Ã? &amp
My point was that I instructed xmllint to output UTF-8 encoding.  But when 
I check the resulting XML, it _doesn't_ have valid UTF-8 encoding.  Does 
that mean that the
  --encode UTF8

parameter of xmllint just sets the encoding attribute in the <?xml processor directive?

  xmllint does not magically convert HTML to XHTML. Use Tidy for this
(see the W3C page for pointers).
Will have a look there.

The actual exact URL is http://www.w3.org/People/Raggett/tidy/ . Thanks for the pointer!

Elizabeth Mattijsen




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]