darktable - driver timeout + black jpeg with blurs module

I can submit this on github as a bug, but I just wanted to see if anyone here could tell me if I was missing something obvious first.
I have found that the blurs module is causing a driver timeout error when I try to export images to JPEG. The JPEG generated is completely black and darktable usually needs to be restarted to get full functionality back. Disabling the blurs module seems to allow the image to be exported fine. I have found this with two images so far.

I have never had any problems with exporting before, nor problems with openCL. This problem first occurred in the middle of exporting multiple images, and all of them exported fine until it got to the one with the Blurs module.

Anyone have an idea what’s going on? My feeling is that the module is doing something that the GPU can’t handle, although my 580 with 8GB VRAM should be able to handle most things that darktable throws at it.

Windows 10
GPU: Radeon RX 580
openCL enabled

File causing me trouble:
DSCF7412.RAF (33.7 MB)
DSCF7412.RAF.xmp (14.2 KB)

This file is licensed Creative Commons, By-Attribution, Share-Alike
(Creative Commons, By-Attribution, Share-Alike)

How much memory headroom do you have for OpenCL? Can you start darktable with -d opencl from the CLI and post the output? Also share the OpenCL settings in preferences.

Your card seems beefy enough, so I’d guess either a buggy driver or you need to tweak your OpenCL settings in darktable.

What version of darktable you are using? The current master has a significant number of improvements with OpenCL.

Also, do you have Microsoft OpenCL compatibility pack installed (it could be installed by default)? I had to uninstall mine.

I started darktable from the command line as you said, but I wasn’t sure where to look for the output. Am I right that it’s in darktablerc? If so, this is what I found:

opencl=TRUE
opencl_async_pixelpipe=false
opencl_avoid_atomics=false
opencl_checksum=3259126302
opencl_device_priority=/!0,///!0,*
opencl_disable_drivers_blacklist=false
opencl_library=
opencl_mandatory_timeout=200
opencl_memory_headroom=400
opencl_memory_requirement=768
opencl_micro_nap=1000
opencl_number_event_handles=25
opencl_scheduling_profile=default
opencl_size_roundup=16
opencl_synch_cache=active module
opencl_use_cpu_devices=false
opencl_use_pinned_memory=false

Looks like headroom=400 and requirement=768. Is that the problem?

As for my openCL settings, they are just at the default I think:
activate OpenCL support = checked
OpenCL scheduling profile = default

I’m on darktable 3.8.1, although I also tried it on a 3.9 build that @priort made, and the same issue happened with default settings in preferences.

Doesn’t look like a challenging set of modules…is it only with OpenCL activated?? Exports fine for me…I have a 3060Ti… You could try boosting the handles …your other settings don’t seem extreme in any way and should work I would think??

If I uncheck “activate OpenCL support”, restart darktable, then try and export, the export just hangs.

Do you mind trying this one too. It might be more challenging in terms of processing power:

DSCF7564.RAF (35.8 MB)
DSCF7564.JPG.xmp (985 Bytes)

Correct xmp??? No real processing…

Threw on 3 x blur and 3x diffuse…no issues with that image…

EDIT Beautiful image by the way… Maybe just for giggles create a new temp folder to use as a config folder and run DT with --configdir pointing to that …should use fresh config files…see if that lets it work properly??

Ah sorry, that was not the correct xmp, this one is.
DSCF7564.RAF.xmp (11.6 KB)

NP observed

Thanks Todd.
So the fact it is working for you would suggest it’s probably a problem with OpenCL or my graphics card. This may be a daft question, but where should I be looking to read the output of OpenCL info when I use diagnostic parameters to start darktable? Is it output to the darktable-log.txt file in C:\Users\XXX\AppData\Local\Microsoft\Windows\INetCache\darktable on Windows?
I have never done any OpenCL testing before. Thanks!

To be honest I always forget what goes where… I think you can do something like append > c:\opencltesting\log.txt to redirect it to where you want… perhaps there is a more elegant way??

I would use Bill’s compile from last week darktable windows insider program 4/10

I installed Bill’s last package and I’m still getting the problem. The image loads fine in the darkroom. I then try and export it. It says it has exported successfully, but then I get the driver timeout error and the jpeg is black. After that, the image is also black in the darkroom and I have to restart darktable to see it again.

Did you used the tune opencl performance in preferences at least once?

Also, do the log with -d opencl -d memory -d perf

I have set it to “memory size and transfer” for tuning OpenCL performance. Did some processing, closed darktable, relaunched it. Everything seems to work just fine… until I try to export that photo. When I turn off the Blurs modules (I have 2 instances), it exports fine. As soon as I enable just one of them and try to export, I get the driver timeout error.

As for this, interestingly I did manage to successfully export the image with just one Blurs module activated, but only once. Here are the logs I managed to get:

log1.txt (641.9 KB)
perf.txt (313.6 KB)

I think you should just run the tuning once and that’s it.

One of the log file has information, but the other is seems like it is missing the important information. Ideally you should have a log file when you do get the driver timeout. What driver is timing out on you?

I’m just getting a generic AMD popup:

image

I don’t really know how to use the OpenCL performance tuning yet on the 3.9 dev build (not sure if it’s in the dev manual yet). I tried setting the OpenCL scheduling to “very fast GPU” and it worked! But just once and that was at a 50% smaller file size. Subsequent attempts have failed again.

What happens if you turn off the AMD software?

Are you using the latest AMD drivers? 22.4.1?
https://www.amd.com/en/support/graphics/radeon-500-series/radeon-rx-500-series/radeon-rx-580

I’m running the latest stable drivers, which are 22.3.1. The latest are beta drivers, which might be fine, but I’m skeptical they would solve this issue.

When you say turn off the AMD software, there’s about 7 AMD processes in task manager, you mean end all those?