Re: [xml] xmlDocDumpFormatMemory problem?

From: Igor Zlatkovic <igor stud fh-frankfurt de>
To: "Gang Wang" <WangGang computermotion com>, <xml gnome org>
Subject: Re: [xml] xmlDocDumpFormatMemory problem?
Date: Wed, 26 Jun 2002 11:02:07 +0200

Hi there,

Now I understand the reason. However, if the parties using the their XML
document agree upon the indentation, there will be no confusion on how to
interpret the data. Is there any handy tools for pre/post-processing XML
document, so that "insignificant" spaces could be eliminated?


At this time, it is not possible to agree. Two humans can agree on the
identation and interpret the data correctly. Computers cannot. We do not
live in an age of intelligent machines yet, machines which can draw
conclusions on what is data and what indent based on the textual
interpretation of each text node. Our computers must perform in
deterministic manner. You are thinking as a human, but must learn to
understand how a computer performs.

A space, a newline and a tab are legal characters in text nodes. I am
allowed to put any of these anywhere in my text nodes, at the beginning, at
the end, in the middle, wherever I want. This means, when you get a indented
document from me, you would need a deterministic algorithm that can tell
which spaces, tabs and newlines are meant to be data and which are meant to
be whitespace. Given that XML does not provide a way to mark indenting
characters in a way that makes them distinguishable from data characters, I
say that such algorithm does not exist.

Consider this:

  <node>
    This is the text,
    the text this is.
  </node>

Now I say that the first line that reads 'This is the text' has four leading
spaces and I want these spaces to be there. The second line reading 'the
text this is' also has four leading spaces, these are, however, created by
the indenting engine. An algorithm which can differ the two does not exist.
Not even human could differ them had I withheld the fact which is which.

You can write a tool which treats all whitespace at the beginning and at the
end of the line as insignificant and uses it for indentation purposes. Many
text editors with XML support have such code built in and each and every of
them would break the meaning of my little <node/> snippet above. However,
there is no parser which would understand such document the way it was meant
to be understood. You and your party would surely end up with different DOM
structures and that might be acceptable to you and your party, but is
generally fatal if that DOM should transport data.

The best way would be if you would simply view XML files as binary. They are
text-files is just for the sake of being able to transport them through all
existing communication channels. Use them through the parser software and do
not open them in a text editor. There is nothing to see in there. HTML is
meant for display, but XML is meant for data and just like in any database,
every byte matters.

Ciao
Igor

References:
- RE: [xml] xmlDocDumpFormatMemory problem?
  - From: Gang Wang

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]