[xml] DTD validation & whitespace removal



Please forgive me, this is *surely* a FAQ, but several hours of poring over libxml2 documentation has left me 
high and dry.

My question is this: I have a document and a DTD spec. I want to parse the file & validate against the DTD in 
such a way that the whitespace ruled out by the DTD is either removed or detectable.

Let me be more concrete.

Here's a totally-ordinary xml file with a self-contained DTD:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE people_list [
<!ELEMENT people_list (person)*>
<!ELEMENT person (name , birthdate? , gender? , socialsecuritynumber?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT birthdate (#PCDATA)>
<!ELEMENT gender (#PCDATA)>
<!ELEMENT socialsecuritynumber (#PCDATA)>
]>
<people_list>
  <person>
    <name>Fred Bloggs</name>
    <birthdate>27/11/2008</birthdate>
    <gender>Male</gender>
  </person>
</people_list>

Using xmlParseFile, I can read and validate it just fine. The parsed document's 'properties' of 11 indicates 
(among other things) that DTD validation was successful. However, even with the inline dtd spec, the 
people_list element (for instance) is observed to have three children: the whitespace, the person, and the 
whitespace. I had a flash of hope when I discovered the 'xmlIsBlankNode' function... sadly, it appears that 
this function returns zero even on the whitespace-only nodes.

I'm certainly hoping that this is a simple question.  I've been through the FAQ and the example code quite a 
number of times.

Many thanks in advance for any help whatsoever; an RTFM (with a pointer) would be just fine with me.

Your obedient servant, &c.

John Clements




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]