AMD opencl problems in surface blur darktable module

Dear all,

I know opencl on AMD cards is not something great but it mostly works but what consistently doesn’t work is using the surface blur module with it. It normally instantly crashes when activating it or changing values. Or opening images which have the module already enabled from old edits. As my main work is portrait stuff, surface blur really comes handy for evening out skin tones following Aurélien’s nice tutorial. So not having access to either opencl or the module is very annoying. I can’t really figure out what the problem is. Bug report and further thoughts are here:

I appreciate any input as this is driving me crazy. It was working better earlier so this could as well be a driver issue after all.

-d opencl driver info:

 0.291417 [opencl_init] device 1 `Ellesmere' supports image sizes of 16384 x 16384
 0.291420 [opencl_init] device 1 `Ellesmere' allows GPU memory allocations of up to 6745MB

 [opencl_init] device 1: Ellesmere 
 CANONICAL_NAME:           ellesme
 GLOBAL_MEM_SIZE:          8192MB
 MAX_WORK_GROUP_SIZE:      256
 MAX_WORK_ITEM_DIMENSIONS: 3
 MAX_WORK_ITEM_SIZES:      [ 1024 1024 1024 ]
 DRIVER_VERSION:           3354.7 (PAL,HSAIL)
 DEVICE_VERSION:           OpenCL 2.0 AMD-APP (3354.7)

So surface blur with opencl is working for all other AMD users?

Just gave it a fast try (never used this module before) and seems to work for me, but I’m on Windows.

1 Like

Thank you! Try moving the picture, zooming in and out. Sometimes it works for a few steps for me as well before it crashes.

Did everything on some pictures. No problems here.

1 Like

Thanks, so it could be a Linux problem or a Arch problem still :confused:

I can’t reproduce on some images with the same modules and masks copied. Yet for some pictures it’s an almost instant crash…

Also with slightly different settings there is no crash at all. Might it be the crop size that leads to this. Maybe a more capable person than me can spot the differences in the xmps causing this:
This one doesn’t crash dt:

While this one does after a few zooming and panning events:

Just exchanging history does not make it not crash. I don’t know if I’m following the right path. Fishing in the dark :confused:

If someone wants to have a go, this image also crashes after zooming and panning for maximum 10s. XMP file included.

I played around with this one, two times (panning, zooming, changing some values). No problems here. Not tested the other ones since they came without raw files.

1 Like

Just out of curiosity… what happens if you add this line to src/iop/bilateral.cc at line 212, and then cause the crash to occur?

    fprintf(stderr, "threads=%d procs=%d\n", omp_get_max_threads(), omp_get_num_procs());
// splat into the lattice

Does it print values for threads and procs that do not match?

https://www.toptal.com/developers/hastebin/elagijalow.yaml

seems to be always 4 - 4

Ok I think I see the problem… your backtrace indicates that in the input image from the previous module in the pipeline, there was a pixel whose value was NaN. That caused the crash in surface blur. So probably surface blur should be fixed so that it can handle NaN values without crashing, and also whatever created that NaN value earlier in the pipeline should be investigated.

1 Like

I often see green blocks at the border (of the current zoom level). I think it happens when using RCD. I’ll try to see if Amaze doesn’t exhibit that.

Indeed with Amaze I don’t see green blocks and I can’t force the crash. Thanks for pointing me in the right direction!

Haha It left scars in my brain as I constantly keep waiting for darktable to crash when some rendering takes a little while longer :smiley: