Re: [xml] *mlDocDumpMemory[Enc] functions croak?



On Tue, May 03, 2005 at 10:37:07PM -0700, Abraham Nelson wrote:
The code for the file page.html is taken from the site
www.teleworking.gr

What happens is that the file is parsed properly, but
when trying to dump it an error occurs:

   output conversion failed due to conv error
   Bytes: 0xCE 0xCE 0xCE 0xCE
   I/O error : encoder error

I find this error odd, since I've specified the same
output encoding as what the tree is.

paphio:~/XML -> xmllint --html http://www.teleworking.gr/
http://www.teleworking.gr/:97: HTML parser error : Unexpected end tag : style
        css += '</style>\n'
                        ^
http://www.teleworking.gr/:130: HTML parser error : Unexpected end tag : a
idth="80"><img border="0" src="images/home_space.gif" width="80" height="1"></a>                              
                                                 ^output conversion failed due to conv error
Bytes: 0xCE 0xCE 0xCE 0xCE
I/O error : encoder error
http://www.teleworking.gr/:19: element script: error : String is not UTF-8
  
There is a serious parsing failure in the script content. This seems to be the
source of the error, as the output from the command I pasted ends up in the
middle of the script itself.
The input HTML page is too broken w.r.t. the HTML spec to actually parse
correctly and be serialized correctly too.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]