Re: [Gegl-developer] Integration of GSOC2011 work in GEGL


I'm afraid there is still some work to be done in the opencl branch. I've been reorganizing the code a little and rebasing it always when I can. But there are two important things missing yet, which I intend to do as soon as possible:

1. implementing an gegl:over operator.
2. asynchronous opencl tasks instead of synchronizing after each one, this requires major reworking in my code.

1. is relatively easy to do, but 2. is really needed if we want good performance .


Also, I have some questions which I'd like the community opinion if possible:

the current scheme where I alloc as many tiles in the GPU as there are in the image is complicated, for two reasons:

* GPUs have weird addressing modes, so it's hard to know if a tiled image can fit in the GPU until I try to alloc a buffer in the GPU and it fails. Also, the drivers are not optimized for this use case, usually they expect a few large memory buffers while we have many small of them, I've been through some weird problems with memory management during this project. 

* PCI transferences for small tiles have too much overhead, if we want it to really have good performance we need to use very big tiles [like 4096x4096] which kills the purpose of using tiles in the first place, shouldn't it be the case of just un-tiling the processing region in the image in GPU processing?

I know the main point in using the current scheme is avoiding memory transferences back and forth the GPU for each operation in a link. But I'd like someone to give a look at the current state of the code in my branch. The code is very complex and requires locking for each operation in a tile [because it's possible to have cpu and gpu data out-of-sync], maybe we should just keep it simple and do what the Darktable guys have been doing [which is the simple scheme I mentioned].

Also, there are some new processors in the market with integrated GPUs which should minimize a lot this memory problem. But I'm not sure about this.

Predictable GPU memory use
Less overhead in PCI transferences and GPU processing
Much simpler code

We have to bring data back and forth the GPU for each operation

So, the question is. I have to do a major rework of the code anyway, I have no problem in doing it the way I explained, which I believe is less prone to errors, but what you guys think about it?

Victor Oliveira

On Sun, Oct 30, 2011 at 6:12 AM, Michael Muré <batolettre gmail com> wrote:

Here is the status of the work I did during the Summer and that hasn't been merged yet.

Branch soc-2011-warp:
- a hacky cache operator for the GIMP warp tool undo system:
  * not tested yet
  * no need to urge since it won't be in GIMP 2.8 anyway.

- Copy-On-Write for gegl_buffer_dup:
  * see

- an attempt to make the nearest-neighbour works with format_n buffer:
  * not tested yet
  * should be extended to the linear sampler
  * bug is here

Branch abyss:
- the abyss policy, renamed to repeat mode, ie how samplers behave when asking for data outside the buffer extent
  * sampler call graph is here:
  * all the functions have been modified except gegl_buffer_iterate

I don't have much time at the moment because I'm working on a time expensive project with my school, but I will end this one day or another if no one did.


2011/10/30 Jon Nordby <jononor gmail com>
All the work in the operations porting project has been merged
already. For the rest there is work that is not yet in master. What is
the plan for integrating it? What needs to be done and who is going to
do it? Where is the status of this being tracked?

Jon Nordby -
