[xml] Incorrect attribute-value normalisation of entities
- From: chris0 lavabit com
- To: xml gnome org
- Subject: [xml] Incorrect attribute-value normalisation of entities
- Date: Wed, 23 Mar 2011 05:13:13 +0000
Attribute-value normalisation of entities changed in libxml v2.7.4
(see <https://bugzilla.gnome.org/show_bug.cgi?id=587663>). It looks
like the patch for this (<http://git.gnome.org/cgit/libxml2/patch/?id=283d50279d2defbcedc940a4261758afa0fe752b
>) introduced more errors then it solved. It's currently (v2.7.8)
not possible to escape white space in an entity definition for use in
an attribute value. This is a significant regression.
[Note: I've indented command output & used some C escape sequences
elsewhere for improved readability.]
For the file 'test.xml':
<?xml version="1.0"?>
<!DOCTYPE x [
<!ENTITY T1 "	">
<!ENTITY T2 "&#9;">
<!ENTITY T3 "&T2;">
<!ENTITY T4 "&T2;">
]>
<x a='&T1;' b='&T2;' c='&T3;' d='&T4;' e=' ' f='	'>
a='&T1;' b='&T2;' c='&T3;' d='&T4;' e=' ' f='	'</x>
[Note: "e='\t'".]
xmllint-2.7.3 --format --noent test.xml
<?xml version="1.0"?>
<!DOCTYPE x [
<!ENTITY T1 "	">
<!ENTITY T2 "&#9;">
<!ENTITY T3 "&T2;">
<!ENTITY T4 "&T2;">
]>
<x a="	" b="	" c="	" d="	" e=" " f="	">
a=' ' b=' ' c=' ' d=' ' e=' ' f=' '</x>
xmllint-2.7.8 --format --noent test.xml
<?xml version="1.0"?>
<!DOCTYPE x [
<!ENTITY T1 "	">
<!ENTITY T2 "&#9;">
<!ENTITY T3 "&T2;">
<!ENTITY T4 "&T2;">
]>
<x a=" " b=" " c=" " d=" " e=" " f="	">
a=' ' b=' ' c=' ' d=' ' e=' ' f=' '</x>
I believe both of these to be incorrect.
The entities & their replacement text are:
T1 "	" => "\t" [a single tab char]
T2 "&#9;" => "	" [a 4 char character reference]
T3 "&T2;" => "&T2;" [a 4 char entity reference]
T4 "&T2;" => "&T2;" [a 4 char entity reference]
[This appears to be correct in 2.7.8 when using --debugent.]
As you are aware, the attribute values should be normalised according
to '3.3.3 Attribute-Value Normalization' <http://www.w3.org/TR/2008/REC-xml-20081126/#AVNormalize
>. [Note: I'll refer to the 4 bullets under step 3 as clauses 3a,
3b, 3c & 3d.]
This is what should happen:
a='&T1;' => "\t" => "\x20" [due to clause 3c.] (correct in 2.7.8
but NOT 2.7.3)
b='&T2;' => "	" => "\t" [due to clause 3a.] (INCORRECT in
2.7.8 but correct 2.7.3)
c='&T3;' => "&T2;" => "	" => "\t" [due to clause 3b & then
(recursively) 3a.]
d='&T4;' => "&T2;" => "	" => "\t" [due to clause 3b & then
(recursively) 3a.]
(Attributes c & d are INCORRECT in 2.7.8 but correct 2.7.3)
(Attributes e & f are correct in 2.7.3 & 2.7.8)
Note that for attributes b, c & d their entity references' (T2, T3 &
T4) replacement texts DO NOT contain any white space characters (i.e.
not '\t'), so they are NOT replaced with space characters.
EXPECTED OUTPUT:
<?xml version="1.0"?>
<!DOCTYPE x [
<!ENTITY T1 "	">
<!ENTITY T2 "&#9;">
<!ENTITY T3 "&T2;">
<!ENTITY T4 "&T2;">
]>
<x a=" " b="	" c="	" d="	" e=" " f="	">
a=' ' b=' ' c=' ' d=' ' e=' ' f=' '</x>
Regards,
Chris
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]