Re: [xml] HTML parsing with libxml2




So, basically, how can I make libxml2 parse the document and ignore the character encoding (or fallback to a default encoding and continue, on error)? Or how can I make it simply ignore any unknown characters? I really need to use libxml and "out-of-range" characters are messing the parsing :(

libxml is an XML parser, do not require it to parse IE-ready html code ;-)

You can always clean the document on your own before passing it to libxml2. Or you can use libtidy or similar tool to clean your code.

P.P>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]