Re: [Gegl-developer] New serialization format

On Jul 5, 2012, at 7:57 AM, Michael Natterer wrote:

And XML was ruled out because it's not the latest fad any longer?

I think this is pretty much the right answer.  There is a ton of XML hate in the world right now.

Having fought this battle when dealing with millions of lines of code, 100's of thousands of lines of JSON 
and/or XML, I can leave the following advice…

XML is probably the right answer here.  XML sucks in the following ways.

1. It's verbose.  This is actually good for humans, but it sucks as a wire format, and some people feel the 
verbosity is unreadable.  That's only true if you're able to keep all the context in your head.  Once someone 
screws up the indentation, or you're 1000 lines in and 12 nested levels deep, having the extra context of tag 
names makes a huge difference.  Also, gzip is awesome here and solves the on-disk space issues.

2. It's complex.  No argument here.  There is a lot of things is supposed to do, and a major ambiguity that 
people always complain about (attribute vs. elements).

3. Many of the parsers are memory hogs (tree parsers) or very slow (though that's gotten much better and 
doesn't apply to the parser gegl is using).  They were copying too many strings.

1 and 3 means it sucks as an on-wire format for interactive HTTP requests (though gzip pretty much negates 
1).  2 means it's hard to write a fast JS parser for it, which means your HTML5 app will get slow.

Everyone says "it's more readable!" Then they try to maintain a large file, using their JSON file.  Then they 
discover that validation and line numbers for errors, and a more expressive grammar go a long way towards 
keeping programs simpler.  The first time you spend an hour trying to track down where your missing "," 
caused your entire file to fail to parse, you'll wish you had a better parser.  I haven't found a JSON parser 
that will actually spit out line numbers and context for errors.  With XML, it's easy to combine multiple 
grammars (think embedding GEGL ops into another XML document).  It has a validation language (two of them, in 
fact. yes, they have warts… but they do actually work for most things).  It's easier for new brains to look 
at (though slower for familiar brains).  It's more self-describing, for those who expect their file format to 
be produced or consumed by many other programs.  It's amazing how important strict specification can be when 
it comes to using a file as an interchange format.  XML is much better at this, than most other options.

Anyways, if you just expect your serialization to be temporary (like a wire format), needs to be parsed fast 
by a huge variety of hardware in languages without a byte array (JS), or is only produced and consumed by 
your own application, then JSON (or BSON, or protocol buffers) seem like a good choice.  If you're going for 
more of an interchange format, stick with XML.

Thus I would strongly suggest using XML for this.

Also, as far as structure goes, if you want to represent a general graph, you can draw inspiration from DOT, 
the language of graphviz.  There is also graphML.  You could frankly use graphML straight out of the box, 
though it has lots of features you're probably not interested in.

The general structure is usually:

  .. graph attributes …
  <node />
  <node />
' <node />
  <edge />
  <edge />
  <edge />

So you don't try to put a tree in the text at all.  IT's just a list of nodes and edges.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]