Re: [xml] how to interpret/reproduce this type of xml?

From: "LAUN, Wolfgang" <wolfgang laun thalesgroup com>
To: <xml gnome org>
Subject: Re: [xml] how to interpret/reproduce this type of xml?
Date: Wed, 7 Oct 2009 12:25:21 +0200

 Hi Bob,

Your question is not at all ignorant or simple.

In XML, all characters between a "<tag...>" and its counterpart "</tag>"
are relevant, either being this element's content, or a subordinate
element. Therefore, you cannot decide, just by looking at some content
text, whether a blank or a newline is content as set by the XML text
creator - or merely a formatting quirk.

Therefore, it's only possible by taking the kind of document and element
into account, or by being assisted by an XML schema's information, that
XML processing can handle the content adequately. If you are dealing
with XHTML, the content of the paragraph element <p> (and also some
others) should be interpreted by trimming leading and trailing
whitespace and collapsing embedded runs of white space to a single
blank. (This is XML schema's processing facet "collapse".) With (X)HTML,
it's the task of a renderer (printer or browser) - possibly assisted by
style sheets - to supply spacing before and after a paragraph's text,
indentation of the first line, alignment, line breaks, etc.

Moreover, notice that <body> has "content", too - the result of all the
characters surrounding the contained <p>-elements. But the
interpretation of <body> does not require processing of its content
value at all. 

-W


From: Bob Sabiston <bob flatblackfilms com>
To: xml gnome org
Subject: [xml] how to interpret/reproduce this type of xml?
Message-ID: <390335E2-7734-49BF-8BFC-D2FE94448AE2 flatblackfilms com>
Content-Type: text/plain; charset="us-ascii"; Format="flowed";
        DelSp="yes"

Hello,  I am new to xml and libxml, so please forgive me if the
following is an ignorant or simple question.

I'm trying to write some code that reads and writes to a pre-existing
xml file format.
I'm having trouble with one part of the file, where elements of text
make up individual elements.

See below, where the text "Notes 1", "Notes 2", "Notes 3" are each
contained within <p></p> brackets?  I am having trouble figuring out how
to write in that format or read it, because the content is text, but
between the brackets there is also text that is NOT part of the content.
By that I mean that the first <p> is followed by a newline and then some
number of characters due to indenting.  The end of the text is followed
by another newline and more spaces.

So it's all due to the formatting that I'm having trouble, but does
anyone know how to do this?  Specifically, if I'm reading the file and I
get the text between the brackets, how do I know where the formatting
ends and the real text starts?  If I'm writing the file, what do I do to
write it in this format?

<richcontent TYPE="NOTE"><html>
   <head>

   </head>
   <body>
     <p>
       Notes 1
     </p>
     <p>
       Notes 2
     </p>
     <p>
       Notes 3
     </p>
   </body>
</html>
</richcontent>


I really appreciate any help anyone can offer!
Thanks
Bob

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]