Re: [xml] Re: Is it possible to skip illegal UTF-8 characters when parsing?
- From: "Christopher R. Maden" <crism maden org>
- To: xml gnome org
- Subject: Re: [xml] Re: Is it possible to skip illegal UTF-8 characters when parsing?
- Date: Tue, 13 Aug 2002 03:17:41 -0700
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[I realize this is a couple of days old, but I didn't see a definitive
response.]
At 05:05 12/8/02, Steinar Bang wrote:
Daniel Veillard <veillard redhat com>:
Well, no, the specification is very clear about it,
Actually, no it isn't. The EBNF for character data in mixed content
doesn't explicitly forbid it. :-)
There's a reason there's prose in the spec, and not just a big steaming
pile of EBNF. Section 4.3.3 says,
In the absence of information provided by an external
transport protocol (e.g. HTTP or MIME), it is an error
for an entity including an encoding declaration to be
presented to the XML processor in an encoding other than
that named in the declaration, or for an entity which
begins with neither a Byte Order Mark nor an encoding
declaration to use an encoding other than UTF-8.
and
It is a fatal error if an XML entity is determined (via
default, encoding declaration, or higher-level protocol)
to be in a certain encoding but contains octet sequences
that are not legal in that encoding. It is also a fatal
error if an XML entity contains no encoding declaration
and its content is not legal UTF-8 or UTF-16.
That latter seems clear to me (the former just defines that that all data
is in UTF-8 unless a BOM is given or another encoding is explicitly
specified). Bogus UTF-8 bytes must immediately halt processing of the entity.
~Chris
- --
Christopher R. Maden, Principal Consultant, crism consulting
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4 5DFC AC52 F825 AFEC 58DA
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Privacy 6.5.8
iQA/AwUBPVjcxKxS+CWv7FjaEQJyIwCg889otYzU+sBVD3hsZOsp1N16tm0AoKP5
+j2myu5DNQCezUjldD/W8HBh
=Un/A
-----END PGP SIGNATURE-----
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]