Re: [Gegl-developer] GEGL OpenCL Porting



Hi Nanley,

It's cool that you're doing this, it's important work.

About your GPU performance, have in mind that:

1) GEGL is tile-based, so there a memcpy over the whole input  when we use the GPU to linearize the image data. Our tile sizes are quite small (128x64 last time I saw it) so it's too much of an overhead to call a CL kernel for such small data, that's why the cl iterator has to go over much larger regions (such as 2048x4096).

2) Besides that, there's the PCIe 2.0/3.0 bus overhead, it is specially a problem for bandwidth-limited filters such as yours, If you have a convolution filter for example, performance is much better on the GPU compared to the CPU.

Even though, there's a lot of speedup that can be gained by chaining multiple operations/filters so that intermediate data never leaves the GPU, that's where the real performance gain is. Consider also that in many filters there's adjustable params where real-time feedback to the user is important. In this case, input data will be uploaded to the GPU only once as long as the image fits completely in the GPU memory. Our CL code has a caching system using the whole GPU  memory as a pool.

in the perf folder there's some cool utilities that can be used for simple profiling. In my system I got a Intel GPU which is not a particularly fast GPU:

victorolivs-MacBook-Pro:perf victoroliv$ export GEGL_USE_OPENCL=no

victorolivs-MacBook-Pro:perf victoroliv$ ./test-bcontrast

@ bcontrast: 971.37 megabytes/second

victorolivs-MacBook-Pro:perf victoroliv$ export GEGL_USE_OPENCL=yes

victorolivs-MacBook-Pro:perf victoroliv$ ./test-bcontrast

@ bcontrast: 364.53 megabytes/second

victorolivs-MacBook-Pro:perf victoroliv$ export GEGL_USE_OPENCL=no

victorolivs-MacBook-Pro:perf victoroliv$ ./test-blur

@ gaussian-blur: 163.04 megabytes/second

victorolivs-MacBook-Pro:perf victoroliv$ export GEGL_USE_OPENCL=yes

victorolivs-MacBook-Pro:perf victoroliv$ ./test-blur

@ gaussian-blur: 248.08 megabytes/second


I haven't really tried much OpenCL in the CPU, but Pippin should have more information about the performance measurements he did. You can submit patches in the mailing list.

Victor

On Thu, Nov 20, 2014 at 7:59 PM, Nanley Chery <nanleychery gmail com> wrote:
Thanks for the quick fix, it's working on my system. 

I noticed that you've enabled GPU's by default due to some testing. Where can I find these results? According to my tests among two operations Edge-laplace and Video-degradation (currently on my bitbucket branches: edge_upstrm, vid_upstrm), OpenCL on my GPU performs 7.8x slower than my CPU. 

The following are some results for the Video-Degradation operation. I took the average over 5 trials and the units are in seconds. On my CPU, I was able to achieve a 37.6x average speed up using OpenCL than without it. Enabling multiple threads increases my times ~0.01s. 

ImgSize Intel Core i7-2675QM No OpenCL Radeon HD 6450 Speedup from No Opencl GPU Slowdown from CPU
32x32 0.000650425 0.0668745008 0.0014333608 102.8166211323 2.2037295614
64x64 0.0007871332 0.0678992536 0.0021703034 86.2614530806 2.7572250796
128x128 0.0007964746 0.0676906748 0.0048043746 84.9878637687 6.0320499863
256x256 0.0014093932 0.0683310326 0.0154375408 48.4825899543 10.9533243101
512x512 0.0080697294 0.074770076 0.0621315552 9.2654997824 7.6993356432
1024x1024 0.0287260124 0.0886687912 0.2647257696 3.0867072661 9.215541855
2048x2048 0.1051937164 0.1571853124 0.976372136 1.4942462134 9.2816583482
4096x4096 0.4144775556 0.3923948302 4.07693664 0.9467215411 9.8363266838
5197x5543 0.5391386664 0.6064622732 6.9632312602 1.1248725254 12.9154736882




37.6073972516 7.8771850173

Please let me know if you spot anything wrong with my measuring methodology or OpenCL implementation. Also, is the mailing-list and the bugzilla both suitable places to submit patches? 

Thanks,
Nanley

On Thu, Nov 20, 2014 at 2:00 AM, Victor Oliveira <victormatheus gmail com> wrote:
I put it back, hopefully everything is alright now.

Victor

On Wed, Nov 19, 2014 at 2:41 PM, Nanley Chery <nanleychery gmail com> wrote:
> Thanks for the question Victor. I'm actually running a custom perl script to
> automate the process. Your question led me to find a bug in the script.
>
> Cheers,
> Nanley
>
> On Wed, Nov 19, 2014 at 5:33 PM, Victor Oliveira <victormatheus gmail com>
> wrote:
>>
>> Have you tried GEGL_DEBUG=opencl ?
>>
>> On Wed, Nov 19, 2014 at 2:32 PM, Nanley Chery <nanleychery gmail com>
>> wrote:
>> > I'm glad we could find this bug. Rolling back to the older version of
>> > gegl-operation-point-filter.c and adding support for enums in
>> > gegl-operation.c allows my opencl kernel to run (among other changes). I
>> > will rebase my repo on top of master once it's updated. The last issue
>> > that
>> > I'm having is that I get no entry for gegl:video-degradation when I have
>> > instrumentation enabled (GEGL_DEBUG_TIME=1). I've been parsing the
>> > output to
>> > determine the speed of other opencl implementations. Any suggestions?
>> >
>> > Thanks,
>> > Nanley
>> >
>> > On Wed, Nov 19, 2014 at 2:26 PM, Nanley Chery <nanleychery gmail com>
>> > wrote:
>> >>
>> >> It seems like the code to initialize and run the opencl kernel was lost
>> >> in
>> >> this commit:
>> >>
>> >>
>> >>
>> >> https://git.gnome.org/browse/gegl/commit/gegl?id=a206f032f77064cf9bff8590ac83ca5b086b53fd
>> >>
>> >> I'm not familiar enough with the codebase to understand the commit
>> >> message. Why was this functionality removed?
>> >> Should I add the deleted code into video degradation's process
>> >> function?
>> >>
>> >> Thanks,
>> >> Nanley
>> >>
>> >> On Wed, Nov 19, 2014 at 12:57 AM, Nanley Chery <nanleychery gmail com>
>> >> wrote:
>> >>>
>> >>> I noticed there was more to the brightness-contrast example. I made
>> >>> the
>> >>> adjustments concerning the kernel name and parameter values.
>> >>> The code compiles now. The current problem that I'm experiencing is
>> >>> that
>> >>> the run-composition.py test for video-degradation passes with an empty
>> >>> kernel.
>> >>> I'm not sure which code paths are executing to make this work. Any
>> >>> pointers? I'll do some grepping of the source tree in the meantime.
>> >>>
>> >>> Thanks,
>> >>> Nanley
>> >>>
>> >>> On Tue, Nov 18, 2014 at 8:22 PM, Nanley Chery <nanleychery gmail com>
>> >>> wrote:
>> >>>>
>> >>>> Wow. Thank you for the tip, CL_CHECK is now giving me an output.
>> >>>>
>> >>>> This is the error message:
>> >>>> (lt-gegl:10486): GEGL-video-degradation.c-WARNING **: Error in
>> >>>> video-degradation.c:236 cl_process - invalid kernel
>> >>>>
>> >>>> I thought that I had followed the kernel compilation process
>> >>>> correctly.
>> >>>> Do you notice any mistake? I have pushed my latest change to the
>> >>>> branch.
>> >>>>
>> >>>> Nanley
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Nov 18, 2014 at 8:06 PM, Victor Oliveira
>> >>>> <victormatheus gmail com> wrote:
>> >>>>>
>> >>>>> Hi Nanley,
>> >>>>>
>> >>>>> I'd recommend you follow operations/common/brightness-contrast.c
>> >>>>> file
>> >>>>> for a point-filter operation (i.e. a pixel-wise filter) instead of
>> >>>>> doing what you did.
>> >>>>>
>> >>>>> Notice that in operations/common/brightness-contrast.c#n153 there's
>> >>>>> a
>> >>>>> string brightness_contrast_cl_source which is a string in
>> >>>>> opencl/brightness-contrast.cl.h, these are auto-generated files from
>> >>>>> the kernels in the opencl folder.
>> >>>>>
>> >>>>> Let me know what happens from that.
>> >>>>>
>> >>>>> Victor
>> >>>>>
>> >>>>> On Tue, Nov 18, 2014 at 4:45 PM, Nanley Chery
>> >>>>> <nanleychery gmail com>
>> >>>>> wrote:
>> >>>>> > Hi Victor,
>> >>>>> >
>> >>>>> > Thank you very much for taking a look. I understand about the
>> >>>>> > time.
>> >>>>> >
>> >>>>> > Here's the link to my bitbucket branch:
>> >>>>> > https://bitbucket.org/nanoman281/gegl-cse6230/branch/vid_upstrm
>> >>>>> >
>> >>>>> > The latest commit is what's causing the video-degradation.xml test
>> >>>>> > to
>> >>>>> > fail
>> >>>>> > (I'm testing using run-compositions.py).
>> >>>>> >
>> >>>>> > Nanley
>> >>>>> >
>> >>>>> > On Tue, Nov 18, 2014 at 5:11 PM, Victor Oliveira
>> >>>>> > <victormatheus gmail com>
>> >>>>> > wrote:
>> >>>>> >>
>> >>>>> >> Hi Nanley,
>> >>>>> >>
>> >>>>> >> Just to let you know, I'll need some time to answer that because
>> >>>>> >> I'll
>> >>>>> >> need to build GIMP on my new laptop.
>> >>>>> >>
>> >>>>> >> Can you share your code so I can give a look?
>> >>>>> >>
>> >>>>> >> Victor
>> >>>>> >>
>> >>>>> >> On Tue, Nov 18, 2014 at 12:49 PM, Nanley Chery
>> >>>>> >> <nanleychery gmail com>
>> >>>>> >> wrote:
>> >>>>> >> > Hi Victor,
>> >>>>> >> >
>> >>>>> >> > I'm a student working on OpenCL porting work for my High
>> >>>>> >> > Performance
>> >>>>> >> > Computing class. I'm trying to implement an OpenCL port for the
>> >>>>> >> > newly-committed video-degradation operation. Are you willing to
>> >>>>> >> > provide
>> >>>>> >> > guidance on the following roadblock?
>> >>>>> >> >
>> >>>>> >> >
>> >>>>> >> > The issue that I'm finding is that creating a cl_process method
>> >>>>> >> > and
>> >>>>> >> > setting
>> >>>>> >> > the following variables in gegl_op_class_init is not enough to
>> >>>>> >> > get
>> >>>>> >> > the
>> >>>>> >> > cl_process method called:
>> >>>>> >> >
>> >>>>> >> > operation_class->opencl_support = TRUE;
>> >>>>> >> > point_filter_class->cl_process = cl_process;
>> >>>>> >> >
>> >>>>> >> > If I manually try to call the cl_process function in the
>> >>>>> >> > process
>> >>>>> >> > method
>> >>>>> >> > (like in edge-laplace.c), the program terminates in the
>> >>>>> >> > gegl_cl_set_kernel_args method without an error from CL_CHECK;
>> >>>>> >> >
>> >>>>> >> > Is there something I'm missing? I apologize for mailing you
>> >>>>> >> > directly
>> >>>>> >> > instead
>> >>>>> >> > of writing to the mailing list. I'm a little pressed for time,
>> >>>>> >> > so
>> >>>>> >> > I
>> >>>>> >> > opted
>> >>>>> >> > for this option.
>> >>>>> >> >
>> >>>>> >> > Regards,
>> >>>>> >> > Nanley
>> >>>>> >
>> >>>>> >
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>
>




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]