On Feb 4, 2010, at 10:02 AM, Daniel Veillard wrote:
On Thu, Feb 04, 2010 at 09:31:11AM -0800, John Clements wrote:On Feb 4, 2010, at 7:09 AM, Daniel Veillard wrote:On Thu, Feb 04, 2010 at 08:53:42AM -0500, Piotr Sipika wrote:John, Try parsing the document using: xmlReadFile(URI, encoding, options) with options set to XML_PARSE_NOBLANKS (in addition to anything else you want to use)Honnestly, I think it's a bad advice in general. The blank nodes used for "formatting" are an integral part of the XML document content and users should rather learn XML and do the right thing than tweak the parser to become non conformant.Ah! Got your attention. What is the "right thing" to do? Specifically: the DTD contains information about where whitespace is significant; how is this information represented in the parsed tree? Duplicating the knowledge about where whitespace is significant seems fragile.yes it's fragile because it depends on the DTD validation step being
...
So in general, the logic of handling text nodes need to be put at the application level, and it's highly contextual. It's hard to extract the DTD informations about the content model (well it's not trivial) and sometimnes it may not be available either. I would not delegate that logic purely to the DTD, but this is just my opinion ;-)
Many thanks for your note. My summary, FWIW: I was hoping to use XML as a "lingua franca" to share parsed ASTs between language implementations in a compilers class, using a Relax NG spec to ensure compliance. So... it's possible, but it's much more painful than I expected; I hoped that using a Relax NG spec would allow me to transparently serialize/deserialize my data structures, but we're obviously not quite there yet. It feels like much of this comes from XML's "markup" background, which is not something that's useful in this situation. Again, many thanks for your note and all your work in implementing libxml2. All the best, John Clements
Attachment:
smime.p7s
Description: S/MIME cryptographic signature