Re: [xml] xmllint --html problem?



On Fri, Nov 09, 2001 at 01:03:40PM +0100, Elizabeth Mattijsen wrote:
Does the following sequence of commands indicate a problem in the HTML 
parsing of libxml or not?

  No

# xmllint --version
xmllint: using libxml version 20409
# xmllint --html --encode UTF8 71.html >71.xml 2>/dev/null

 This parse an HTML resource and save an HTML resource.
I assume there was errors (2>/dev/null) so I don't have much context.

# xmllint --noout  71.xml
71.xml:53: error: Input is not proper UTF-8, indicate encoding !
ophy of Education, The</a><br/>Edited by Michael A. Peters (New Zealand)Ã? &amp
                                                                             
    ^
1.xml:53: error: Bytes: 0xC3 0x20 0x50 0x61
ophy of Education, The</a><br/>Edited by Michael A. Peters (New Zealand)Ã? &amp


File "71.html" available on request: it's about 53K which I thought would 
be too large to send to the list right away...

  You're asking the XML parser to parse an HTML resource, it fails,
this is not surprizing.
  xmllint does not magically convert HTML to XHTML. Use Tidy for this
(see the W3C page for pointers).

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]