[xml] Re: Is it possible to skip illegal UTF-8 characters when parsing?



Daniel Veillard <veillard redhat com>:

Well, no, the specification is very clear about it,

Actually, no it isn't.  The EBNF for character data in mixed content
doesn't explicitly forbid it. :-)

But that's probably an error/oversight in the document, because I
think it was meant to forbid it, since it is explicitly forbidden in
both comments and CDATA sections.

it's a fatal error and from that point the parser should not provide
any more data to the application.
Your data is not XML :-\

Not all wellformedness errors should be treated equally harsh, IMO.
An illegal character is not on the same level as eg. unbalanced tags,
or missing closing quotes.

I've just added an enhancement into bugzilla, requesting an option for
skipping illegal characters without stopping the parsing.

Side note: both MSXML and XMLSpy ate the offending document without
reporting any errors.  The illegal characters showed up as empty
rectangles in XMLSpy.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]