darktable 3.4/3.5 opencl slow on Windows 10

Hi,

I have a new PC with a Ryzen 7 processor, an Nvidia 1660 Super and 16 GB of RAM. I have a full hd screen. The problem is that darktable 3.4 is terribly slow, when opencl is activated. If I use denoise profiled, it takes about 3 seconds until I see the preview.
If I switch off opencl, darktable is much faster, the preview needs about 1 second.
I also run Linux on this machine and there, darktable is much faster with opencl.
On Windows, I have tried both the newest gaming driver and the creative driver from Nvidia.
I also executed darktable-cltest.exe and opencl seems to be working - the performace measurement in the task-manger confirms that too.
What is going on here? Is this a darktable problem or a driver issue?

Thanks in advance

Anna

Hello Anna.
I’m totally new here, but I also use Darktable on Win 10. I switched to version 4.5 the day before yesterday.
You can find it here current win10 build incl. cr3 support. T
The necessary information for this as well. For me it runs much smoother than version 3.4.1. You can try it out if you feel like it.

1 Like

thanks. Just tried it. Still very slow. Maybe even slower than 3.4.

Too bad. How different it is.

1 Like

Well, I just compiled it myself, optimized for my system, and it’s still terribly slow.

Not sure if running this will help…also some reference values provided…

https://math.dartmouth.edu/~sarunas/darktable_bench.html

1 Like

You probably know this but you could analyse by using the command line with options -d opencl -d perf if there is something wrong. What you describe could be wrong cl kernels falling back to cpu code.

1 Like

So this is an extract from the output:

204,328510 [pixelpipe_process] [full] using device 0
204,329197 [dev_pixelpipe] took 0,001 secs (0,000 CPU) initing base buffer [full]
204,332114 [dev_pixelpipe] took 0,003 secs (0,016 CPU) processed `raw black/white point' on GPU, blended on GPU [full]
204,347704 [dev_pixelpipe] took 0,016 secs (0,016 CPU) processed `white balance' on GPU, blended on GPU [full]
204,363219 [dev_pixelpipe] took 0,015 secs (0,016 CPU) processed `highlight reconstruction' on GPU, blended on GPU [full]
204,382632 [dev_pixelpipe] took 0,019 secs (0,000 CPU) processed `demosaic' on GPU, blended on GPU [full]
206,278653 [dev_pixelpipe] took 1,896 secs (0,000 CPU) processed `denoise (profiled)' on GPU, blended on GPU [full]
206,289845 [dev_pixelpipe] took 0,011 secs (0,000 CPU) processed `graduated density' on GPU, blended on GPU [full]
206,306563 [dev_pixelpipe] took 0,017 secs (0,000 CPU) processed `base curve' on GPU, blended on GPU [full]
206,321956 [dev_pixelpipe] took 0,015 secs (0,000 CPU) processed `input color profile' on GPU, blended on GPU [full]
206,336799 [dev_pixelpipe] took 0,015 secs (0,000 CPU) processed `color balance' on GPU, blended on GPU [full]
206,604426 [dev_pixelpipe] took 0,268 secs (0,000 CPU) processed `contrast equalizer' on GPU, blended on GPU [full]
206,642725 [dev_pixelpipe] took 0,038 secs (0,000 CPU) processed `local contrast' on GPU, blended on GPU [full]
206,653690 [dev_pixelpipe] took 0,011 secs (0,000 CPU) processed `output color profile' on GPU, blended on GPU [full]
206,661007 [dev_pixelpipe] took 0,007 secs (0,000 CPU) processed `display encoding' on CPU, blended on CPU [full]
206,661049 [opencl_profiling] profiling device 0 ('NVIDIA GeForce GTX 1660 SUPER'):
206,661053 [opencl_profiling] spent  0,0010 seconds in [Write Image (from host to device)]
206,661055 [opencl_profiling] spent  0,0001 seconds in rawprepare_1f
206,661058 [opencl_profiling] spent  0,0001 seconds in whitebalance_1f
206,661060 [opencl_profiling] spent  0,0001 seconds in highlights_1f_clip
206,661062 [opencl_profiling] spent  0,0001 seconds in border_interpolate
206,661064 [opencl_profiling] spent  0,0002 seconds in ppg_demosaic_green
206,661066 [opencl_profiling] spent  0,0003 seconds in ppg_demosaic_redblue
206,661068 [opencl_profiling] spent  0,0003 seconds in denoiseprofile_precondition_v2
206,661070 [opencl_profiling] spent  0,0001 seconds in denoiseprofile_init
206,661072 [opencl_profiling] spent  0,0303 seconds in denoiseprofile_dist
206,661074 [opencl_profiling] spent  0,0152 seconds in denoiseprofile_horiz
206,661076 [opencl_profiling] spent  0,0614 seconds in denoiseprofile_vert
206,661078 [opencl_profiling] spent  0,0491 seconds in denoiseprofile_accu
206,661080 [opencl_profiling] spent  0,0003 seconds in denoiseprofile_finish_v2
206,661082 [opencl_profiling] spent  0,0056 seconds in [Read Image (from device to host)]
206,661085 [opencl_profiling] spent  0,0003 seconds in graduatedndp
206,661087 [opencl_profiling] spent  0,0003 seconds in basecurve_lut
206,661089 [opencl_profiling] spent  0,0003 seconds in colorin_unbound
206,661093 [opencl_profiling] spent  0,0003 seconds in colorbalance_cdl
206,661094 [opencl_profiling] spent  0,0007 seconds in [Copy Image (on device)]
206,661097 [opencl_profiling] spent  0,0119 seconds in eaw_decompose
206,661098 [opencl_profiling] spent  0,0031 seconds in eaw_synthesize
206,661100 [opencl_profiling] spent  0,0003 seconds in pad_input
206,661102 [opencl_profiling] spent  0,0039 seconds in gauss_reduce
206,661104 [opencl_profiling] spent  0,0020 seconds in process_curve
206,661106 [opencl_profiling] spent  0,0034 seconds in laplacian_assemble
206,661108 [opencl_profiling] spent  0,0003 seconds in write_back
206,661110 [opencl_profiling] spent  0,0003 seconds in blendop_mask_Lab
206,661112 [opencl_profiling] spent  0,0004 seconds in blendop_Lab
206,661115 [opencl_profiling] spent  0,0006 seconds in colorout
206,661118 [opencl_profiling] spent  0,1922 seconds totally in command queue (with 0 events missing)
206,661694 [dev_process_image] pixel pipeline processing took 2,333 secs (0,047 CPU)

With such PC, seems quite strange that darktable is slow.
Did you check that you have last Nvidia drivers? To be sure, the best is to check on Nvidia website of course.

And could you post (screenshot it’s the best way) your cpu/gpu/memory tab settings in darktable main preferences?

I re-installed the Nvidia-driver like 4 times!

How large are your images? What denoising parameters do you use?

1 Like

For example, the RAF from https://discuss.pixls.us/t/morning-foggy-view-from-hill/25262:

45.630649 [dev_pixelpipe] took 0.042 secs (0.035 CPU) processed denoise (profiled)’ on GPU, blended on GPU [full]`

This is with default (non-local means).
With the wavelets: chroma only preset:

178.159880 [dev_pixelpipe] took 0.017 secs (0.010 CPU) processed `denoise (profiled)' on GPU, blended on GPU [full]

With the CR2 from https://discuss.pixls.us/t/sharpening-and-denoising-with-dt-and-others/25210:

335.834535 [dev_pixelpipe] took 0.057 secs (0.043 CPU) processed `denoise (profiled)' on GPU, blended on GPU [full]

and

363.669637 [dev_pixelpipe] took 0.023 secs (0.173 CPU) processed `denoise (profiled)' on GPU, blended on GPU [full]

My card is an NVidia 1060/6GB, so it’s older and most probably slower than yours. I’m on darktable master.

1 Like

Well I think the size of the photo does not really matter, it’s the resolution of the screen that matters - at least for the preview.

Noise reduction settings: it’s the expensive non local means auto, central pixel weight 5, autoset paramenters 10, strength 0.5.

The raws are from an Olympus E-M5 Mark 3, 20 megapixels.

Btw, I also tested this on my Intel i7 laptop, and that one is faster than my brand new desktop.
So, looks like there is nothing directly wrong with darktable for Windows.
However, on Linux, on my laptop, I am not sure whether there is much benefit from OpenCL - apparently there is not so much difference with and without Opencl.
I think I am a bit confused.

maybe preview pipe and ful pixel pipe are both processed on the same device.
try darktable -d perf -d opencl | grep -e'dev_process_' -e'using device' to get an idea which pipe is processed on which device.

To speedup stuff it might be helpful to manually set prioritization in darktablerc.

For my system configuration prioritizing the more powerful device for full pixelpipe, and explicitly deprioritize this device for preview pipe (opencl_device_priority=1,0,*/!1,*/1,0,*/*/*) increased overall performance for me. You need to play around with these settings …

if that doesnt help you also can reduce the preview image to 1/2 or even 1/4 size in preferences to speed up this.

1 Like

I don’t quite understand. My desktop PC only has one OpenCL device, so there is no point in setting the device priority oder, in there?
Nevertheless I will try the command you suggested.
However, if I exclude the device from processing some preview, does that mean that the CPU will be used instead?
I am more and more confused. Why is there a problem on Windows while on Linux everything is fine? The settings are the same, aren’t they? The software version is the same.

you can check the opencl capable devices with darktable-cltest. this also gives you the numbering of your gpu devices.
Usually the cpu’s on board gpu can also support opencl. But maybe you need to disable blacklisting in darktablerc first - but this depends on your cpu.

1 Like

but not the drivers :wink:

1 Like

According to darktable-cltest There is only one device 0 which is the Nvidia. What a surprise! My system does not have any igpu.

I think I will create a second user account and test again. Something is messed up.

I just upgraded the Nvidia driver (Win) on my laptop. I think the driver is broken. It’s slower - as slow as without opencl.