I’ve recently become plagued with frustratingly slow performance when I load images into darktable.
When I open a batch of photos, whether it’s just a few or over 1000, the thumbnails take forever to load in lighttable, and when I try to open a group in culling mode and zoom to 100%, the system slows to a crawl.
While this is going on, I’m monitoring CPU and GPU usage, and the CPU is pegged at 100%, while the GPU is sitting unused at 0%.
Is this normal? Should I expect the GPU to be used for these tasks? Is there something wrong in my darktable settings?
My system:
• Fedora 39 Linux
• darktable 4.6.0
• CPU: AMD Ryzen 7 3700X 8-Core
• GPU: nVidia GeForce GTX1660 Super
• 32GB RAM
darktable-cltest shows the GPU is recognized and enabled:
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init] 0 'NVIDIA CUDA NVIDIA GeForce GTX 1660 SUPER'
1.0135 [opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_device_priority: '+0/+0/+0/+0/+0'
[opencl_init] opencl_mandatory_timeout: 20000
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 1 1 1 1 1
And yet, I don’t see anything hit the GPU at more than a brief bump of a few percent. I would expect it to hit 100% GPU usage. Or am I understanding this wrong?
A lot of those settings in darktablerc don’t look correct. Create a backup of that file and then rename it to old so you can start over with opencl settings.
Post the complete output of darktbale-cltest and an output of -d common (both as txt files).
Thanks. I’ve moved my darktablerc file and allowed darktable to create a new default file. Here are the txt files with output from darktable.cltest and darktable -d common.
GPU looks good. Since it shows that you have 4 platforms, keep only the Nvidia checkbox on.
With your fast card, I would change the processing from default to ‘very fast gpu’. This will force all the processing to happen in the GPU instead of the CPU. That’s how I use it in my system.
The filmicrgb is taking a some execution time. Do you have a lot of iterations in it? The GPU path should help with this vs CPU.
Thanks! I’ve switched it back to “very fast gpu”. I actually don’t use filmic rgb any more. I recently switched to using sigmoid, but that setting was presumably lost when I reset to a default darktablerc file. I’ve got the old one, so I can get that back.
This may as good as I can get it, but darktable is still annoyingly slow when I zoom to 100% in culling mode. I use this workflow a lot in wildlife photography, where I may take 10-50 photos in a burst and use culling mode in lighttable to select the best one to process. When I zoom to 100%, all the images show “working…”, and with each group of 4 (which is the maximum number that lets me zoom), I need to wait 30 seconds or more before I can see the zoomed image. When going through hundreds of files, it becomes a problem. When I monitor my system with nvtop it seems this process is still using the CPU instead of the GPU. See the attached image: CPU is at 584% (which is possible because I have 16 cores), while the GPU is still at 0%.
Is that just the way darktable is, i.e. zooming in lighttable always uses the CPU regardless?
Here’s the output from -d perf. It looks like a lot of thumbnail processing is happening on the CPU. I never realized that so much processing happened on thumbnails. Is there a way to turn that off? Or at least tell it to use the GPU? darktable.dperf.txt (8.5 KB)
We have a similar system (Fedora 39 KDE with Nvidia card). My thumbnail processing is all in GPU and it takes 0.2s total.
There is something going on in your system. It started processing in GPU and then it switched to CPU. It stayed in CPU and some of the modules are taking a long time. The last module to use GPU was:
28.2809 [dev_pixelpipe] took 2.768 secs (7.482 CPU) [thumbnail] processed 'filmicrgb' on GPU with tiling, blended on CPU
With 6Gb of GPU, I’m not sure why it is even tiling.
Sorry to keep asking for more files, lets do a -d opencl
Thanks for your help with this! I’ve run it with -d opencl and attached the output. Something puzzles me about this. It shows reached opencl_mandatory_timeout trying to lock mandatory device. I had earlier tried forcing the GPU by setting opencl_device_priority=+0/+0/+0/+0/+0. That didn’t do anything, so I set it back to the default of opencl_device_priority=*/!0,*/*/*/!0,*. But it still seems to consider the GPU as mandatory and waiting for a timeout before switching to the CPU.
Also, where I’m seeing the greatest delay is in culling mode when zooming four images to 100%, which I guess are actually previews, not thumbnails. But regardless, thumbnails are still showing CPU use. darktable.dopencl2.txt (6.1 KB)
I’ve bumped the mandatory timeout to 1000 and run with -d common. Nothing different that I can see but I haven’t looked at the output very closely yet (and don’t really know what to look for anyway).
The only change to the thumbnail settings I’ve made is allowing generate thumbnails in background.
opencl is installed from RPM packages, which are up to date, so I’m not sure how to recreate the kernel other than reinstalling the packages. darktable.dcommon2.txt (188.4 KB)
When dt starts, it creates opencl kernels to use in the processing modules. It uses the current drivers to create them and stores them in .cache/darktable. You can safely delete the folders since it it will check/generate them at startup. Every time fedora updates the fusion rpm, a new kernel will be generated.
It looks from the latest -d common that most of the processing was in the GPU. Is it faster now?
The main thing I currently notice from your data is the lens module. There are multiple times with the ROI input being smaller than the output.
21.6517 modify roi IN [thumbnail] lens ( 0/ 0) 1349x 900 scale=0.2467 --> ( 0/ 0) 1350x 900 scale=0.2467
What about bumping the handles…its 128 I think … try 1024… also you could try async pipeline… I think you have that disabled… barring those 2 changes I set my micronap to 0 and I get away with that and it did speed things up for me… Just trying to see what’s up as you seem to shift out of GPU from what I read by scanning your various log files…