Re: [xml] xmllint - Newbie THINKS there may be a whitespace error in2.6.23



Greetings and thanks for the response!

----- Original Message ----- From: "Buchcik, Kasimier" <k buchcik 4commerce de>
To: "John Navratil" <jnavratil houston rr com>
Cc: <xml gnome org>
Sent: Wednesday, April 26, 2006 4:20 AM
Subject: AW: [xml] xmllint - Newbie THINKS there may be a whitespace error in2.6.23


Hi,

-----Ursprüngliche Nachricht-----
Von: xml-bounces gnome org [mailto:xml-bounces gnome org] Im
Auftrag von John Navratil
Gesendet: Dienstag, 25. April 2006 21:50
An: xml gnome org
Betreff: [xml] xmllint - Newbie THINKS there may be a
whitespace error in2.6.23

Greetings,

Using xmllint to validate a document thusly:

xmllint --schema test.xsd test.xml

with schema (test.xsd):

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
elementFormDefault="qualified" attributeFormDefault="unqualified">
 <xs:element name="A">
  <xs:annotation>
   <xs:documentation>asdf</xs:documentation>
  </xs:annotation>
  <xs:complexType>
   <xs:sequence>
    <xs:element name="B">
     <xs:complexType>
      <xs:attribute name="ID" type="xs:string" use="required"/>
     </xs:complexType>
    </xs:element>
   </xs:sequence>
  </xs:complexType>
 </xs:element>
</xs:schema>

and document (test.xml):

<A>
 <B ID="1">
 </B>
</A>

I get the error:

test.xml:2: element B: Schemas validity error : Element 'B':
Character
content is not allowed, because the content type is empty.

I thought that --noblanks would strip the whitespace and
eliminate the
error, but find instead that I must modify the document to:

<A>
 <B ID="1" />
</A>

Is this behavior correct?  I observe it in 2.6.22 and 2.6.23
on Fedora Core 4 and 5.

Yes, this behaviour is correct: there must not be any character
content inside the element "B" and, as Daniel said, the --noblanks
option won't remove such whitespace-only text-nodes. --noblanks will
remove whitespace-only text-nodes when you have mixed content;
i.e., when an element has character content *and* element content.
That's why the whitespace after "<A>" and before "</A>" is removed
in Daniel's example:
"
paphio:~/XML -> xmllint --noblanks test.xml
<?xml version="1.0"?>
<A><B>
</B></A>
"

When there's no mixed content, any whitespace is considered
significant by the --noblanks option; I think, that this assumption
could be based on the understatement that noone writes...
<B>
</B>
... if he doesn't want those space characters. You can write instead:
<B/> or
<B></B> or
<B><!-- No.1 the larch --></B> or
<B><?slide No.1 the larch ?></B>
All four cases of the element "B" have no content from
the viewpoint of W3C XML Schema.

For easier reading of the XML document by humans, people start a new
line for every new tag and indent subsequent tags. So the reason, I
think, why there's such a thing as a --noblanks option at all, is
to accommodate this pretty-printing issue by removing such
whitespace-only text nodes, since they are most likely not intended
to be part of the data.
So this:
<A>
 <B/>
</A>
will be stripped to:
<A><B/></A>

However, we have also the mechanism of xml:space which could be
used to exactly define what is to be stripped and what not.
So if we had an option like --noblanksall, which would remove
*all* whitespace-only text-nodes, then you could use xsl:space
to specify where whitespace should be preserved.
Example:
<A>
 <B> </B>
 <C xml:space="preserve"> <D> </D> </C>
</A>

this would be whitespace-stripped with a
--noblanksall option (this option does not exist) to:
<A><B/><C xml:space="preserve"> <D> </D> </C></A>

If I remove the required attribute ("ID") from the schema
and the document, this behavior is not observed.

Check again please; I cannot reproduce this here.

Regards,

Kasimier

-----------------------------------


The example schema and two example documents which do not appear to exhibit this behavior are:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="A">
 <xs:complexType>
  <xs:sequence>
   <xs:element name="B">
   </xs:element>
  </xs:sequence>
 </xs:complexType>
</xs:element>
</xs:schema>

<A>
<B>
</B>
</A>

<A>
<B/>
</A>

The difference in the schema is that the entire structure of node 'B' is removed. If however, the 'complexType' tag is replaced, giving...

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="A">
 <xs:complexType>
  <xs:sequence>
   <xs:element name="B">
    <xs:complexType>
    </xs:complexType>
   </xs:element>
  </xs:sequence>
 </xs:complexType>
</xs:element>
</xs:schema>

the error returns. From your description, I am prepared to accept this as the null condition where the type of the node is mixed by virtue of the 'complexType' tag even though it contains nothing, and in my initial example contained only an attribute definition.

I would like to explain the "real world" example which lead to this for your consideration. We are using a message to transfer data from one database to another. We wished to establish an optional relation between an one entity in our message and another. In our case a customer might be a retail customer or one who purchases through a distributor. This led to the XML fragment...

  <xs:element name="Distributor" minOccurs="0">
   <xs:complexType>
    <xs:attribute name="ID" type="IDType" use="required"/>
   </xs:complexType>
  </xs:element>

so the user could code something like '<Distributor ID="1234" />'. Notice that we have a node with a 'complexType' in order to provide the attribute definition, but which has no sub-nodes (i.e. not very mixed).

The script which generated this code was designed to emit the start node, then recursively render any content, then render the end node. This lead to the document fragment of the form...

<Distributor ID="1234">
</Distributor>

This violates the (not entirely unreasonable) assumption

"that noone writes...
<B>
</B>
... if he doesn't want those space characters."

May I suggest that xmllint treat a complexType with nothing but attributes as mixed-type for purposes of '--noblanks' processing. Your '--noblanksall' would work as well. Perhaps something more specific such as '--attrs-strip-blanks' is more appropriate.

Thanks, again!

John Navratil




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]