Re: [xml] Is it possible to skip illegal UTF-8 characters when parsing?



On Fri, Aug 09, 2002 at 09:51:31AM +0200, Steinar Bang wrote:
Platform: Intel PIII, RedHat 7.2, gcc 2.96 (RPM version number 2.96-98),
        libxml2 2.4.2

Is it possible to make libxml2 skip an illegal UTF-8 character, and
continue parsing, instead of stopping the parsing at this point?

Just getting a "." instead of the actual character is OK.

  Well, no, the specification is very clear about it, it's a fatal error
and from that point the parser should not provide any more data to the
application.
   Your data is not XML :-\

The workaround was to change everything in the incoming data <0x20,
and not one of 0x9, 0xA, or 0xD to a space, before passing it on to
the libxml2 parser, but the preferred solution would be to have
libxml2 handle it.

  it's probably the best way to do it

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]