Re: [xml] DTD validation & whitespace removal

On Feb 4, 2010, at 10:02 AM, Daniel Veillard wrote:

On Thu, Feb 04, 2010 at 09:31:11AM -0800, John Clements wrote:

On Feb 4, 2010, at 7:09 AM, Daniel Veillard wrote:

On Thu, Feb 04, 2010 at 08:53:42AM -0500, Piotr Sipika wrote:
Try parsing the document using:
xmlReadFile(URI, encoding, options)
with options set to XML_PARSE_NOBLANKS (in addition to anything else
you want to use)

Honnestly, I think it's a bad advice in general. The blank nodes
used for "formatting" are an integral part of the XML document content
and users should rather learn XML and do the right thing than tweak the
parser to become non conformant.

Ah! Got your attention.  What is the "right thing" to do?  Specifically:
the DTD contains information about where whitespace is significant;
how is this information represented in the parsed tree? Duplicating the
knowledge about where whitespace is significant seems fragile.

 yes it's fragile because it depends on the DTD validation step being


So in general, the logic of handling text nodes need to be put at the
application level, and it's highly contextual. It's hard to extract
the DTD informations about the content model (well it's not trivial)
and sometimnes it may not be available either. I would not delegate
that logic purely to the DTD, but this is just my opinion ;-)

Many thanks for your note.  

My summary, FWIW: I was hoping to use XML as a "lingua franca" to share parsed ASTs between language 
implementations in a compilers class, using a Relax NG spec to ensure compliance.  So... it's possible, but 
it's much more painful than I expected; I hoped that using a Relax NG spec would allow me to transparently 
serialize/deserialize my data structures, but we're obviously not quite there yet. It feels like much of this 
comes from XML's "markup" background, which is not something that's useful in this situation.

Again, many thanks for your note and all your work in implementing libxml2.

All the best,

John Clements

Attachment: smime.p7s
Description: S/MIME cryptographic signature

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]