RE : [xml] Parsing invalid characters or entities ref



In fact I was asking myself about the opportuneness to give the possibility
in the apis (as an option) to replace invalid characters - where it is
possible - by a given replacement character and so to continue parsing.

-----Message d'origine-----
De : Daniel Veillard [mailto:veillard redhat com] 
Envoyé : lundi 3 novembre 2003 22:27
À : GARNIER Pierre
Cc : 'xml gnome org'
Objet : Re: [xml] Parsing invalid characters or entities ref


On Mon, Nov 03, 2003 at 07:13:22PM +0100, GARNIER Pierre wrote:
Hi all,

When I parse an UTF8 XML document with the xmlTextReader api, if the 
parser encounter the character wich code is 0x03 the parsing is 
stopped. Is there a way to allow the parser to ignore this character 
and continue parsing?

  Hum, no. Your document is not XML and the spec instruct to stop parsing
immediately, and I totally support that behaviour. We don't want XML parsing
to become as unreliable as HTML parsing. On a fatal error the imput must be 
dropped, and the source must be fixed. There is an xmlRecoverFile()
interface which allows to fix a broken file by generating a tree which you
can then save, but it should really be limited to repairing broken files,
the notion of ignoring an error on the fly is really not suitable in an XML
processing.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]