[xml] UTF-8 validation
- From: Norbert Lindenberg <norbert lindenberg yahoo-inc com>
- To: xml gnome org
- Subject: [xml] UTF-8 validation
- Date: Fri, 5 Oct 2007 16:10:56 -0700
Hi there,
Can you tell me whether libxml2 does complete validation of UTF-8
when input is provided in this character encoding? By complete
validation I mean:
- Verifying that each character is represented by a byte sequence
that matches one of the patterns described in section 3 of RFC 3629.
- Verifying that each character is represented by the shortest
possibly byte sequence (ruling out, for example the use of 0xC0 0x80
for U+0000).
- Verifying that supplementary characters are represented by a 4-byte
sequence, not by a pair of surrogate characters.
- Verifying that illegal code points, such as the not-a-character
characters, U+FFFE, U+FFFF, etc., do not occur.
Bug report 305333 implies that some of this validation occurs, but
the references to the obsolete RFC 2044 in the documentation worry me
a bit.
Thanks,
Norbert
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]