Re: [xml] Normalization Query



>    For  the  following  attribute  what will be the normalized value, the

>    attribute is of type NMTOKENS

>

>    <doc a="&#32;x&#32;&#32;y&#32;"></doc>

>

>    Will it be

>

>    A=x y

>

>    Or

>

>    A= x  y

 

>  The answer is there

>   http://www.w3.org/TR/REC-xml/#AVNormalize

 

I have gone through this but the confusion still persists... To illustrate I will present certain examples.

Case 1.

<!DOCTYPE doc [

<!ATTLIST doc a1 NMTOKENS "1  2">

<!ELEMENT doc (#PCDATA)>

]>

<doc></doc>

In the above according to my understanding a1 should be normalized to

A1=”1 2”

Libxml is returning,

A1=”1  2”, ie an extra space

 

Case 2.

<!DOCTYPE doc [

<!ELEMENT doc (#PCDATA)>

<!ATTLIST doc a NMTOKENS #IMPLIED>

]>

<doc a="&#32;x&#32;&#32;y&#32;"></doc>

 

Here the spec gives a clear example where

A = "&#xd;&#xd;A&#xa;&#xa;B&#xd;&#xa;"

And if a is nmtoken

A = #xD #xD A #xA #xA B #xD #xA

Which is similar to case 2 in all respects except that characters referenced here are 0xd and 0xa which need not be normalized, only 0x20 needs to normalized….So I guess

A=”x y”

 

Case 3:

<?xml version='1.0' standalone='yes'?>

<!DOCTYPE attributes [

    <!ELEMENT attributes EMPTY>

    <!ATTLIST attributes

    nmtoken    NMTOKEN    #IMPLIED

    nmtokens   NMTOKENS   #IMPLIED

    >   

    <!ENTITY ent " entity&recursive; ">

    <!ENTITY recursive "reference">

]>

<attributes

    nmtoken =  " &ent;   &ent; &ent; "

    nmtokens = " Test&#x0d;&#x0a;     this&#x20; normalization "

/>

 

Here nmtoken’s normalized value according to spec should consist of first acting on the unnormalized value in case of  an entity reference by recursively applying algo mentioned in 3.3 to the replacement text, and once that is done normalize it again since the type is not CDATA….

 

So,

Nmtoken=”entityreference entityreference entityreference”

Nmtokens=”Test0xd0xa this normalization”

 

Libxml gives

Nmtoken=”  entityreference   entityreference   entityreference  ”

Nmtokens=”Test0xd0xa this  normalization” //Extra space between this normalization.

 

The confusion is exarcebated by the fact that Java based parsers are doing Normalization which returns values which I have mentioned that are contrary to what is being returned by libxml….

 

I do not know whether I am interpreting the spec wrongly, so any clarifications regarding the same would be extremely welcome.  

 

Thanks!!!

 

Regards

Ashwin

 

 

 

 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]