Re: [Gegl-developer] Integration of GSOC2011 work in GEGL
- From: Victor Oliveira <victormatheus gmail com>
- To: Michael Muré <batolettre gmail com>
- Cc: gegl-developer-list gnome org
- Subject: Re: [Gegl-developer] Integration of GSOC2011 work in GEGL
- Date: Sun, 30 Oct 2011 13:13:48 -0200
Hi!
I'm afraid there is still some work to be done in the opencl branch. I've been reorganizing the code a little and rebasing it always when I can. But there are two important things missing yet, which I intend to do as soon as possible:
1. implementing an gegl:over operator.
2. asynchronous opencl tasks instead of synchronizing after each one, this requires major reworking in my code.
1. is relatively easy to do, but 2. is really needed if we want good performance .
---
Also, I have some questions which I'd like the community opinion if possible:
the current scheme where I alloc as many tiles in the GPU as there are in the image is complicated, for two reasons:
* GPUs have weird addressing modes, so it's hard to know if a tiled image can fit in the GPU until I try to alloc a buffer in the GPU and it fails. Also, the drivers are not optimized for this use case, usually they expect a few large memory buffers while we have many small of them, I've been through some weird problems with memory management during this project.
* PCI transferences for small tiles have too much overhead, if we want it to really have good performance we need to use very big tiles [like 4096x4096] which kills the purpose of using tiles in the first place, shouldn't it be the case of just un-tiling the processing region in the image in GPU processing?
I know the main point in using the current scheme is avoiding memory transferences back and forth the GPU for each operation in a link. But I'd like someone to give a look at the current state of the code in my branch. The code is very complex and requires locking for each operation in a tile [because it's possible to have cpu and gpu data out-of-sync], maybe we should just keep it simple and do what the Darktable guys have been doing [which is the simple scheme I mentioned].
Also, there are some new processors in the market with integrated GPUs which should minimize a lot this memory problem. But I'm not sure about this.
Pros:
Predictable GPU memory use
Less overhead in PCI transferences and GPU processing
Much simpler code
Cons:
We have to bring data back and forth the GPU for each operation
So, the question is. I have to do a major rework of the code anyway, I have no problem in doing it the way I explained, which I believe is less prone to errors, but what you guys think about it?
bye!
Victor Oliveira
[
Date Prev][Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]