AW: [xml] xmllint - Newbie THINKS there may be a whitespace error in2.6.23



 Hi,

-----Ursprüngliche Nachricht-----
Von: xml-bounces gnome org [mailto:xml-bounces gnome org] Im 
Auftrag von John Navratil
Gesendet: Dienstag, 25. April 2006 21:50
An: xml gnome org
Betreff: [xml] xmllint - Newbie THINKS there may be a 
whitespace error in2.6.23

Greetings,

Using xmllint to validate a document thusly:

xmllint --schema test.xsd test.xml

with schema (test.xsd):

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; 
elementFormDefault="qualified" attributeFormDefault="unqualified">
 <xs:element name="A">
  <xs:annotation>
   <xs:documentation>asdf</xs:documentation>
  </xs:annotation>
  <xs:complexType>
   <xs:sequence>
    <xs:element name="B">
     <xs:complexType>
      <xs:attribute name="ID" type="xs:string" use="required"/>
     </xs:complexType>
    </xs:element>
   </xs:sequence>
  </xs:complexType>
 </xs:element>
</xs:schema>

and document (test.xml):

<A>
 <B ID="1">
 </B>
</A>

I get the error:

test.xml:2: element B: Schemas validity error : Element 'B': 
Character 
content is not allowed, because the content type is empty.

I thought that --noblanks would strip the whitespace and 
eliminate the 
error, but find instead that I must modify the document to:

<A>
 <B ID="1" />
</A>

Is this behavior correct?  I observe it in 2.6.22 and 2.6.23 
on Fedora Core 4 and 5. 

Yes, this behaviour is correct: there must not be any character
content inside the element "B" and, as Daniel said, the --noblanks
option won't remove such whitespace-only text-nodes. --noblanks will
remove whitespace-only text-nodes when you have mixed content;
i.e., when an element has character content *and* element content.
That's why the whitespace after "<A>" and before "</A>" is removed
in Daniel's example:
"
paphio:~/XML -> xmllint --noblanks test.xml
<?xml version="1.0"?>
<A><B>
</B></A>
"

When there's no mixed content, any whitespace is considered
significant by the --noblanks option; I think, that this assumption
could be based on the understatement that noone writes...
<B>
</B>
... if he doesn't want those space characters. You can write instead:
<B/> or 
<B></B> or 
<B><!-- No.1 the larch --></B> or
<B><?slide No.1 the larch ?></B>
All four cases of the element "B" have no content from
the viewpoint of W3C XML Schema.

For easier reading of the XML document by humans, people start a new
line for every new tag and indent subsequent tags. So the reason, I
think, why there's such a thing as a --noblanks option at all, is
to accommodate this pretty-printing issue by removing such
whitespace-only text nodes, since they are most likely not intended
to be part of the data.
So this:
<A>
  <B/>
</A>
will be stripped to:
<A><B/></A>

However, we have also the mechanism of xml:space which could be
used to exactly define what is to be stripped and what not.
So if we had an option like --noblanksall, which would remove
*all* whitespace-only text-nodes, then you could use xsl:space
to specify where whitespace should be preserved.
Example:
<A>
  <B> </B>
  <C xml:space="preserve"> <D> </D> </C>
</A>

this would be whitespace-stripped with a 
--noblanksall option (this option does not exist) to:
<A><B/><C xml:space="preserve"> <D> </D> </C></A>

If I remove the required attribute ("ID") from the schema 
and the document, this behavior is not observed.

Check again please; I cannot reproduce this here.

Regards,

Kasimier



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]