Re: [xml] xmllint --html problem?



Hi Daniel,

At 08:46 AM 11/9/01 -0500, Daniel Veillard wrote:
> # xmllint --html --encode UTF8 71.html >71.xml 2>/dev/null
 This parse an HTML resource and save an HTML resource.
I assume there was errors (2>/dev/null) so I don't have much context.

Plenty of errors, but I don't really care about them. Just trying to get as much out of the HTML with as little manual work possible.


> # xmllint --noout  71.xml
> 71.xml:53: error: Input is not proper UTF-8, indicate encoding !
> ophy of Education, The</a><br/>Edited by Michael A. Peters (New Zealand)Ã? &amp

My point was that I instructed xmllint to output UTF-8 encoding. But when I check the resulting XML, it _doesn't_ have valid UTF-8 encoding. Does that mean that the

  --encode UTF8

parameter of xmllint just sets the encoding attribute in the <?xml processor directive?


  xmllint does not magically convert HTML to XHTML. Use Tidy for this
(see the W3C page for pointers).

Will have a look there.

The actual exact URL is http://www.w3.org/People/Raggett/tidy/ . Thanks for the pointer!


Elizabeth Mattijsen




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]