Date: Wed, 7 Oct 2009 12:25:21 +0200
From: "LAUN, Wolfgang" <wolfgang laun thalesgroup com
Your question is not at all ignorant or simple.
In XML, all characters between a "<tag...>" and its counterpart "</tag>"
are relevant, either being this element's content, or a subordinate
element. Therefore, you cannot decide, just by looking at some content
text, whether a blank or a newline is content as set by the XML text
creator - or merely a formatting quirk.
Therefore, it's only possible by taking the kind of document and element
into account, or by being assisted by an XML schema's information, that
XML processing can handle the content adequately. If you are dealing
with XHTML, the content of the paragraph element <p> (and also some
others) should be interpreted by trimming leading and trailing
whitespace and collapsing embedded runs of white space to a single
blank. (This is XML schema's processing facet "collapse".) With (X)HTML,
it's the task of a renderer (printer or browser) - possibly assisted by
style sheets - to supply spacing before and after a paragraph's text,
indentation of the first line, alignment, line breaks, etc.
Moreover, notice that <body> has "content", too - the result of all the
characters surrounding the contained <p>-elements. But the
interpretation of <body> does not require processing of its content
value at all.