[xml] Is it possible to skip illegal UTF-8 characters when parsing?
- From: Steinar Bang <sb dod no>
- To: xml gnome org
- Subject: [xml] Is it possible to skip illegal UTF-8 characters when parsing?
- Date: Fri, 09 Aug 2002 09:51:31 +0200
Platform: Intel PIII, RedHat 7.2, gcc 2.96 (RPM version number 2.96-98),
libxml2 2.4.2
Is it possible to make libxml2 skip an illegal UTF-8 character, and
continue parsing, instead of stopping the parsing at this point?
Just getting a "." instead of the actual character is OK.
The character in question was a 0x5 character in character data. Is
it completely illegal at this point? The EBNF seems to indicate that
it isn't explicitly forbidden:
<http://www.w3.org/TR/2000/REC-xml-20001006#syntax>
(even though allowing it at this point would admittedly be
inconsistent, since 0x5 _is_ illegal in inside comments or CDATA
sections).
The workaround was to change everything in the incoming data <0x20,
and not one of 0x9, 0xA, or 0xD to a space, before passing it on to
the libxml2 parser, but the preferred solution would be to have
libxml2 handle it.
Thanx!
- Steinar
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]