darktable uses CPU instead of GPU

Dave_G · September 12, 2023, 12:55am

I have two GPUs on my workstation, and have configured darktable to use them, but when editing photos my CPU gets pegged at 100%, slowing everything down, while the GPUs are hardly touched. Here’s my environment:

Fedora 37 Linux
darktable 4.4.1
CPU: AMD Ryzen 7 3700X 8-Core
GPU0: nVidia GeForce GTX1660 Super
GPU1: nVidia GeForce GTX 660
32GB RAM

My darktable CPU/GPU preferences:

darktable resources: large
activate OpenCL support: on
OpenCL scheduling profile: multiple GPUs (similar results with very fast GPU)
tune OpenCL performance: memory size

When I do something intensive like load four images into culling view at 100% zoom, and page though a whole collection, I can watch the CPU usage go up to 100%, while GPU1 (the slower one) briefly hits around 30%, and GPU0 (the faster one) isn’t touched at all. And it takes several seconds for each page to load.

Is this expected behavior? I confess I don’t know much about how darktable manages the whole GPU/OpenCL business, but I would expect the GPUs to be called first.

darktable-cltest shows all three devices:

$ darktable-cltest | grep DEVICE
DEVICE: 0: ‘pthread-AMD Ryzen 7 3700X 8-Core Processor’
DEVICE VERSION: OpenCL 3.0 PoCL HSTR: pthread-x86_64-redhat-linux-gnu-znver2
DEVICE_TYPE: CPU
DEFAULT DEVICE: NO
DEVICE: 1: ‘NVIDIA GeForce GTX 1660 SUPER’
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU
DEFAULT DEVICE: NO
DEVICE: 2: ‘NVIDIA GeForce GTX 660’
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU
DEFAULT DEVICE: NO

None of these are identified as a default device. Should one of them be the default?

g-man · September 12, 2023, 1:05am

Post the entire darktable-cltest via pastebin or similar. Rpmfusion for the drivers?

Dave_G · September 12, 2023, 1:12am

Here’s the entire darktable-cltest output:
https://pastebin.com/W7Euf9aY

The driver is the nVidia proprietary driver (470.199.02).

g-man · September 12, 2023, 1:47am

I would disable the second GPU (at least to test) via darktablerc. Your 1660 has 6gb and it should be plenty. Your Nvidia drivers are not recent. Current ones are 535 or higher. I think you should also turn off the memory tuning.

Dave_G · September 12, 2023, 2:29am

Thanks for your help. I think I’ve got things pretty well dialed in now. The main things that got me over the hump were a) discovering that OpenCL wasn’t using my CPU after all, which became clear when I ran darktable -d opencl; and b) finding the section in the manual that explains the opencl_device_priority setting. Once I got those figured out, darktable is using my GPUs as well as can be hoped. It’s still pegging the CPU for some tasks, evidently for tasks that can’t make use of the GPU.

The other thing that confused me for a bit was misreading which GPU was actually being used. darktable-cltest showed device 0 as the 1660 and device 1 as the older 660. But nvidia-smi, which I was using to monitor performance, had those IDs switched. But because the actual name of the device was truncated, I couldn’t read the name, and it wasn’t until I looked at the memory stats that I realized what was going on.

The latest nVidia drivers don’t support the older 660 GPU, which is why I’m using version 470.199.02, which is the latest driver that supports both devices.

priort · September 12, 2023, 3:09am

Personally I would just use the fast one or at least benchmark with and without. This way also you can use an up to date driver and you can just wipe out all your opencl settings and then run DT and let it start fresh…Also you could delete the kernels as well and let them rebuild… just a thought

Kernel files are in these sort of folders…

rvietor · September 12, 2023, 5:26am

There are several series of drivers maintained (by NVidia). The 470 versions are recommended for slightly older cards, I think…

jorismak · September 12, 2023, 3:27pm

Just thinking out loud , why the 660 in the system with a 1660?

Dave_G · September 12, 2023, 3:37pm

The 660 was my old GPU before I got the 1660. Rather than throwing it in the trash, I thought might as well keep it installed for whatever extra resources it might provide. There’s space for it in the chassis, so why not?

wpferguson · September 12, 2023, 4:25pm

Try setting you system resources to default and see if your GPU utilization goes up.

I had mine set to large and diffuse or sharpen would try and take too much GPU memory. Reducing my system resources to default fixed it.

priort · September 12, 2023, 7:01pm

I can’t specifically give you an example but if you have to use an old driver to support it that might not be the most efficient. Also having it in place means your system has to communicate with it so in the case of DT moving data between computing targets it might turn out to be less efficient… It likely has slower vram…etc etc… So it may be and I preface that with may be like many computer upgrades… you dont’ just keep it because you have it because if it turns out to be a weak link it could actually introduce a penalty or hit on performance…with out benchmarking you would not know…

kofa · September 12, 2023, 7:48pm

tune OpenCL performance: memory size: I’m not sure if any tuning is recommended nowadays.

Also, I don’t know if you have discovered this, but opencl_device_priority is only used if opencl_scheduling_profile=default.

multiple GPUs
[…] Users of systems with a variety of GPUs will need better control on their relative priority. They would be better off selecting the “default” profile and fine-tuning their system with the “opencl_device_priority” configuration parameter (see multiple devices).
(darktable 4.6 user manual - scheduling profile)

darktable 4.6 user manual - multiple devices is, however, slightly outdated. It says:

There are four fields in the parameter string separated by a slash, each representing one type of pixelpipe. a,b,c... defines the devices that are allowed to process the center image (full) pixelpipe. Likewise devices k,l,m... can process the preview pixelpipe, devices o,p,q... the export pixelpipes and finally devices x,y,z... the thumbnail pixelpipes.

In reality, there are 5 such fields, because the 5th one is the preview on the 2nd screen:

github.com

darktable-org/darktable/blob/master/src/common/opencl.c#L1884-L1897


      
            g_strfreev(tokens);
          
            // terminate priority list with -1
            while(count < devs + 1) priority_list[count++] = -1;
          
            // opencl use can only be mandatory if at least one opencl device is given
            *mandatory = (priority_list[0] != -1) ? mnd : 0;
          
            free(full);
          }
          
          // parse a complete priority string
          static void dt_opencl_priorities_parse(dt_opencl_t *cl, const char *configstr)
          {

I simply set them as mandatory on my GPU:
opencl_device_priority=+0/+0/+0/+0/+0

One other item I found worth setting is:
opencl_mandatory_timeout=20000
When darktable wants to start a new operation, it will check if a GPU is available. If the card is busy, darktable will not wait for it to become available forever, even if mandatory GPU processing is set via the priorities: it checks opencl_mandatory_timeout, takes its value, multiplies it by 5, and interprets that as the number of milliseconds to wait. If the card becomes available, the processing will use it; otherwise, it falls back on the CPU. I found that with computationally intensive modules (like diffuse or sharpen) the default timeout (400, or 2 seconds) was too short, causing:

a wait for 2 seconds for the card to become available;
then processing on the CPU, rather slowly. During that time, the GPU finishes its previous task and sits around, unutilised.

For me, it is worth waiting for much longer (max. 100 seconds – yes, that’s an overkill), so the GPU can always finish processing, because it is simply so much faster than my CPU that a typical 5 - 10 second wait for the GPU, followed by let’s say 10 seconds of GPU processing is less than the 20-60 second processing on the CPU. But the best value of this setting really depends on your GPU to CPU speed-up.

hannoschwalm · September 12, 2023, 8:04pm

Using such a small and slow device would certainly require a dt setup making use of that gpu for previews only. Otherwise the device will slow down the system …

Dave_G · September 14, 2023, 12:07am

Thanks for all the great responses. I’m learning a lot here. After doing some real-world editing today instead of my quick poorly-done benchmarks, I’ve confirmed that you all are right, and that keeping my old slower GPU in the mix was indeed slowing things down. So I’ve removed it, upgraded the driver, and with darktable resources set to default, and opencl_device_priority set to only use the GPU, darktable is fast again.

jorismak · September 14, 2023, 7:17pm

Because it costs power ? :).
Cause it doesn’t work that way that programs can use both AFAIK. That you are using al old driver to keep it working.

Maybe in Darktable you could actually set it up to use one for the preview and the other for the full renders. But I’m not sure.