Re: [xml] Add new pretty-printing and sorting options for saving XML



On 05/10/2010, Adam Spragg <adam spra gg> wrote:

The idea of these options is to be able to combine them to produce a
"canonical", nearly line-oriented format for XML files.

Are you familiar with the "Canonical XML" W3C Recommendation and its
implementation in libxml2?

<http://www.w3.org/TR/xml-c14n>
<http://xmlsoft.org/html/libxml-c14n.html>

It has a similar result, but without the aim to insert breaks to make
line-oriented diff and merge tools happier.

XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace
*within* tags, where permitted by the XML standard, to re-line and
indent XML files, without changing any element content at all. No
whitespace is added to, removed from, or altered in any text node of
the document, and no text nodes are are added or removed either.

I presume this is based on the Henri Sivonen suggestion?

<http://hsivonen.iki.fi/producing-xml/#prettyprinting>

In the responses I've seen to that, there's been a fair bit of
pushback, for instance from Uche Ogbuji here:

<http://www.ibm.com/developerworks/xml/library/x-think35.html#listing1>

The other concern is as you're introducing breaks for every element
and attribute, lots of lines start looking the same. That tends to
make the default, simpler diff algorithms produce suboptimal output.

Please let me know what you think of the idea and patches. Are they
suitable for libxml? At all? With work? (If so, what?)

The idea seems reasonable, but I don't know if adding code to libxml2
is the right first step. It's a core library people are rightly
nervous about updating, and with only an implementation and no spec to
go off, it wouldn't be easy for others to interoperate with your new
formatting style.

Martin



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]