[xml] UTF-8 validation

Hi there,

Can you tell me whether libxml2 does complete validation of UTF-8 when input is provided in this character encoding? By complete validation I mean:

- Verifying that each character is represented by a byte sequence that matches one of the patterns described in section 3 of RFC 3629.

- Verifying that each character is represented by the shortest possibly byte sequence (ruling out, for example the use of 0xC0 0x80 for U+0000).

- Verifying that supplementary characters are represented by a 4-byte sequence, not by a pair of surrogate characters.

- Verifying that illegal code points, such as the not-a-character characters, U+FFFE, U+FFFF, etc., do not occur.

Bug report 305333 implies that some of this validation occurs, but the references to the obsolete RFC 2044 in the documentation worry me a bit.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]