[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] Is it possible to skip illegal UTF-8 characters when parsing?



On Fri, Aug 09, 2002 at 09:51:31AM +0200, Steinar Bang wrote:
> Platform: Intel PIII, RedHat 7.2, gcc 2.96 (RPM version number 2.96-98),
> 	  libxml2 2.4.2
> 
> Is it possible to make libxml2 skip an illegal UTF-8 character, and
> continue parsing, instead of stopping the parsing at this point?
> 
> Just getting a "." instead of the actual character is OK.

  Well, no, the specification is very clear about it, it's a fatal error
and from that point the parser should not provide any more data to the
application.
   Your data is not XML :-\

> The workaround was to change everything in the incoming data <0x20,
> and not one of 0x9, 0xA, or 0xD to a space, before passing it on to
> the libxml2 parser, but the preferred solution would be to have
> libxml2 handle it.

  it's probably the best way to do it

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]