Re: [Gegl-developer] VIPS and GEGL performance and memory usage comparison

On Feb 1, 2016 12:40 PM, "Sven Claussner" <scl gplus gmail com> wrote:
> On  29.1.2016 at 5:37 PM Daniel Rogers wrote:
>>   * Anyone can do dynamic compilation nowadays with llvm.  Imagine
>>     taking the gegl dynamic tree, and compiling it into a single LLVM
>>     dynamically compiled function.
> What exactly do you mean? How is this supposed to work and where is the
> performance advantage if done at runtime?

To your first question. I made that statement as a counterpoint to vips turning a convolution kernel into a set of sse3 instructions and executing them.

I believe, though haven't proven rigorously, that a gegl graph is homomorphic to a parse tree of an _expression_ language over images.

In other words, there exists an abstract language for which gegl is the parse tree.

For example:
a = load(path1)
b = load(path2)
c = load(path3)
out = a * b + c

Given suitable types, and suitable definitions for *, =, and +, there is a gegl graph which exactly describes that program above. (For the record, I believe the language would have to be single assignment and lazy evaluated in order to be homomorphic to the DAG of gegl).

If that is the case, you can turn the argument on its head and say that gegl is just an intermediate representation of a compiled language. This makes the gegl library itself an interpreter of the IR.

Given these equivalencies, you can reasonably ask, can we use a different IR? Can we transform one IR to another? Can we use a different interpreter? The answer to all of these is yes, trivially.

So. Can we transform a gegl graph to a llvm IR? Can we then pass that LLVM IR to llvm to produce the machine code equivalent of our gegl graph?

If we did that, then all of the llvm optimization machinery comes for free. So I would reasonably expect llvm to merge operations into single loops, combine similar operations, reduce the overall instruction count, and inline lots of code, reduce indirection, loop unroll, etc. Llvm has quite a few optimization passes.

To your second question: the gegl tree is executed a lot. At least once for every tile in the output. This is especially true if gegl is used interactively and the same tree is evaluated thousands or millions of time with different inputs. Thus you would be trading an upfront cost of building the compiled graph with reduced runtime per tile, and reduced total runtime.

There are potentially more conservative approaches here that turn a gegl tree into a set of byte codes, and refactoring large chunks of gegl into a bytecode interpreter.

A really interesting follow up is just what other kinds of IR and runtimes can we use? Gegl to jvm bytecode? Gegl to Cg? Gegl to an asic? (FPGA, DSP, etc)?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]