Re: [Gegl-developer] New serialization format

From: rassahah googlemail com
To: ville sokk gmail com, gegl-developer-list gnome org
Subject: Re: [Gegl-developer] New serialization format
Date: Fri, 06 Jul 2012 00:59:03 +0200

Ville Sokk <ville sokk gmail com> wrote:

GEGL needs a new serialization format. On IRC two popular choices
were JSON and YAML. Fancy features are not required so both should be
good choices.

YAML pros:
* very readable

JSON pros:
* pretty much every programming language has a JSON library
* simpler than YAML

Bat´O's YAML example: http://pastebin.com/QML9BkCb
and JSON: http://pastebin.com/9ZV9y7Vz

Any ideas why one should be chosen over the other? Maybe there's an
even better option?



Some ideas regarding this serialization thing follow (lengthy)...

from the pastebin example:
{
  "gegl.png":
  {
    "name": "gegl:load",
    "path": "data/gegl.png"
  },
  ...
I assume the "name" here is a mistake and it should actually be
  "operation", right? Anyway, i use "name" for the rest of this post.

As far as i understand this: A gegl serialization file contains an
object, which maps node names to nodes; "gegl.png" is a name of a
node. So later on the json refers to this node for example in the node
with the name "bg1":

  ..., "bg1":
    {
      "name": "gegl:gaussian-blur",
      "input": "gegl.png:output",
      "std-dev-x": 5.0,
      "std-dev-y": 5.0
    },

Personally i do not like this kind of reference:

      "input": "gegl.png:output",

This is because of the string-encoding with the colon that separates
the referee node with the output pad name. Effectively this would
prohibit the use of the colon in names of nodes and parameters. Even
though there are no properties with a ':' in their name currently, the
special use of the ':' would create a dependency for the node
implementations to the serialization format, which is bad. Better
would be to give the node reference and the name of the parameter as a
pair of strings. It would look like this:

  ..., "bg1":
    {
      "name": "gegl:gaussian-blur",
      "input": ["gegl.png", "output" ],
      "std-dev-x": 5.0,
      "std-dev-y": 5.0
    },

It might be useful to consider using even objects for this kind of
reference, because the components can be named. Could look like this:

  ..., "bg1":
    {
      "name": "gegl:gaussian-blur",
      "input": { "node": "gegl.png", "pad": "output" },
      "std-dev-x": 5.0,
      "std-dev-y": 5.0
    },

I do not know what is better here, generally in json based formats,
you will often see the array style (["gegl.png", "output" ]) when you
are concerned about file size and the object style ({ "node":
"gegl.png", "pad": "output" }) when more concerned about readability,
so for this case i would choose the object style.

Also i would suggest allowing for some way to specify a node inline,
without creating a name, for example it could be done with the object
notation like this:

  ..., "bg1":
    {
      "name": "gegl:gaussian-blur",
      "input": {
        "node": {
          "name": "gegl:load",
          "path": "data/gegl.png"
        }, 
        "pad": "output"
      },
      "std-dev-x": 5.0,
      "std-dev-y": 5.0
    },

The node references ("gegl.png", "bg1" etc) would still be required
for nodes that get used in more than one place, though. But in my
experience those global references will always get in the way in one
or two places, sooner or later, because for example they hinder you to
take some nodes out of one file and paste it into another file without
doing some renaming first. In general, modifying a graph will be more
difficult, because one always has to watch the references of the nodes.

Another thing: how are other data types to be represented (the pastebin does
not contain that case). For example in json the gegl:color node should look
something like this:
{
  "my-color": {
    "name": "gegl:color",
    "value": <my color here>
  }
}
but what is <my color here>? Perhaps it could be an object with rgba values:
  "my-color": {
    "name": "gegl:color",
    "value": {"r": 0.1, "g": 0.2, "b": 0.3, "a": 0.4 }
  }
Are there other data types, that would need to be represented (for example
the "d" property of a gegl:fill-path node)?

For the json vs xml: I think the json is more readable than xml,
especially for small files, whereas xml has more support from things
like schema, xsl, etc. But the benefits of xml (in my opinion at
least) come only into play when the additional overhead of using xml
is outweighted by the processing time (number of files) one needs to
handle, which is probably not the case for gegl, OR (and that might be worth
to think about) when intermixmangling the gegl xml with some other xml by
using different namespaces, for example by joining a gegl graph with some
svg nodes in it, all in a single file (personally i do not know any concrete
usecase for this, so i would ignore this for now).
I do not know enough about yaml to compare with it.

Best regards - Rasmus

References:
- [Gegl-developer] New serialization format
  - From: Ville Sokk

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]