Re: [xml] HTML parsing with libxml2

From: PaweÅ PaÅucha <pawel praterm com pl>
To: Macy Gasp <macygasp gmail com>
Cc: xml gnome org
Subject: Re: [xml] HTML parsing with libxml2
Date: Fri, 05 Aug 2005 15:01:24 +0200

So, basically, how can I make libxml2 parse the document and ignore thecharacter encoding (or fallback to a default encoding and continue, onerror)? Or how can I make it simply ignore any unknown characters?I really need to use libxml and "out-of-range" characters are messingthe parsing :(


libxml is an XML parser, do not require it to parse IE-ready html code ;-)

You can always clean the document on your own before passing it tolibxml2. Or you can use libtidy or similar tool to clean your code.


P.P>

Follow-Ups:
- Re: [xml] HTML parsing with libxml2
  - From: Daniel Veillard

References:
- [xml] HTML parsing with libxml2
  - From: Macy Gasp
- Re: [xml] HTML parsing with libxml2
  - From: Daniel Veillard
- Re: [xml] HTML parsing with libxml2
  - From: Macy Gasp

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]