Re: [xml] Add new pretty-printing and sorting options for saving XML


On Thursday 07 Oct 2010 07:45:43 Martin (gzlist) wrote:
On 05/10/2010, Adam Spragg <adam spra gg> wrote:
The idea of these options is to be able to combine them to produce a
"canonical", nearly line-oriented format for XML files.

Are you familiar with the "Canonical XML" W3C Recommendation and its
implementation in libxml2?

A bit familiar. I wasn't particularly aware of it while I was coding, but 
looking at it now, it does ring bells. It may well have inspired part of this.

It has a similar result, but without the aim to insert breaks to make
line-oriented diff and merge tools happier.

Well, I split up the re-ordering and the whitespace changes into two separate 
options, so you could do one without the other.

XML_SAVE_WSNONSIG is a new pretty-printing format
I presume this is based on the Henri Sivonen suggestion?


Again, I had seen the idea somewhere else before, but couldn't remember where. 
It may have been his, or it may have been from someone else. Can't say.

In the responses I've seen to that, there's been a fair bit of
pushback, for instance from Uche Ogbuji here:


"I disagree because I think it makes for ugly markup that's not friendly to 
manipulation by people."

Well, I think ugly is a matter of familiarity. I think the GNU coding 
guidelines recommend what is to me an unworkably ugly and awkwards C brace 
style, but plenty of other people seem to have got used to it and be 

I will admit though, I do think it is rather ugly if your document already 
contains lots of pretty-printing whitespace in the content. But if not, it 
seems OK to me.

As for being manipulable by people, well, I always thought that XML was 
primarily a machine generated/readable language, which happens to be fairly 
human-readable in order to make debugging and quick hacks easier.

On top of that, if people don't find that representation very workable, 
there's no reason why they shouldn't be able to use xmllint (or a similar 
tool) to reformat the document into anther pretty-printed format which they 
can deal with easily, and then transform it back afterwards.

Heh. I'm not suggesting that this format be made the default. I just want to 
make it available as an option.

The other concern is as you're introducing breaks for every element
and attribute, lots of lines start looking the same. That tends to
make the default, simpler diff algorithms produce suboptimal output.

I was going to cross that bridge when I came to it. :-)

Please let me know what you think of the idea and patches. Are they
suitable for libxml? At all? With work? (If so, what?)

The idea seems reasonable, but I don't know if adding code to libxml2
is the right first step. It's a core library people are rightly
nervous about updating, and with only an implementation and no spec to
go off, it wouldn't be easy for others to interoperate with your new
formatting style.

OK. Thanks.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]