[xml] xmlKeepBlanksDefault



I'd like to check my understanding of xmlKeepBlanksDefault.

What I want to do is to make xmlParseFile not generate whitespace nodes
(i.e. I want it to generate the same tree exactly as if no additional
whitespace had been provided), but have xmlSaveFormatFile write
a file out with formatting.

I know I can set XML_PARSE_NOBLANKS in xmlParserOption in xmlReadFile, but
this is not available in xmlParseFile. xmlReadFile also seems to do far
more than xmlParseFie (currently I'm using ML_PARSE_NONET |
XML_PARSE_NODICT | XML_PARSE_NOXINCNODE | XML_PARSE_NOBLANKS to turn all
that off). Thus I want to avoid using xmlReadFile (rather than
xmlParseFile) as it appears to do a lot more stuff I'd like to avoid (and
I've already tracked one SEGV down to it putting stuff in the tree I don't
want).

Setting xmlKeepBlanksDefault to 0 looks promising, and indeed appears
to work. However, the manual page somewhat cryptically says:

Set and return the previous value for default blanks text nodes support.
The 1.x version of the parser used an heuristic to try to detect
ignorable white spaces. As a result the SAX callback was generating
xmlSAX2IgnorableWhitespace() callbacks instead of characters() one, and
when using the DOM output text nodes containing those blanks were not
generated. The 2.x and later version will switch to the XML standard way
and ignorableWhitespace() are only generated when running the parser in
validating mode and when the current element doesn't allow CDATA or mixed
content. This function is provided as a way to force the standard
behavior on 1.X libs and to switch back to the old mode for compatibility
when running 1.X client code on 2.X . Upgrade of 1.X code should be done
by using xmlIsBlankNode() commodity function to detect the "empty" nodes
generated. This value also affect autogeneration of indentation when
saving code if blanks sections are kept, indentation is not generated.

I've read that several times and still cannot understand it.

My observations are:

1. Contrary to the last line, it does not appear to affect output format.
  With xmlSaveFileFormat (anyway) the output appears to be the same
  whether this is set or not.

2. I don't understand the sentence starting "The 2.x and later version".
  I am running 2.x, and even though I am not running the parser in
  validating mode, then with xmlKeepsBlankDefault set to 1 it *does*
  appear to generate blank nodes.

3. I /appear/ to be using a compatibility mode, though despite reading
  the paragraph several times, I don't know whether
  xmlKeepBlanksDefault(1) (the default) is the compatibility mode,
  or whether xmlKeepBlanksDefault(0) is the compatibility mode.

4. This appears to have been written from the point of view of someone
  writing their own parser. A description of how it affects xmlParseFile
  and friends would be really useful.

What am I missing here?

--
Alex Bligh



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]