darktable 3.4/3.5 opencl slow on Windows 10

Just a guess: related to disabling crypto-mining in the NVIDIA drivers …

2 Likes

I don’t quite understand what crypto mining is.

I also have an Arch-based Linux on a pendrive. Is has the 465 Nvidia driver, performance seems to be ok. On the internal ssd I have Debian Bullseye. Obviously that driver is even older, I think 460 or so.
But the first Nvidia driver that I installed on Windows in May was the 462 driver. And there was a performance issue already.

1 Like

Your observations sound more as if the issue is more related to the quality of the drivers, i.e. Windows vs. Linux than related to some restrictions introduced into the drivers recently.

1 Like

I guess the graphics cards that I have are not really suited for crypto mining. The MX250 is ridiculous and the GTX 1660 Super is not so powerful either.
Something is wrong here… something is mysterious.
Someone should test whether opencl works with other apps on Windows.
I have googled a lot because of this problem and I did not find anything helpful or even similar. It’s not possible that none has upgraded their driver recently. It’s not possible that nobody noticed that the Nvidia driver for Windows is broken, is it?

You could try Geekbench or something similar.
Geekbench 5 - Cross-Platform Benchmark – there’s some kind of free /trial version, see Geekbench 5 - Cross-Platform Benchmark

Or PerformanceTest FAQ Index - and they have a large database of test measurements, so you can compare yours with those of others.

1 Like

I have a Win10 laptop and 3.4.1 with these specs:

i7-8565u, Nvidia Geforce MX250 2gb (latest driver), Intel UHD 620, 32gb ram, 512gb SSD, 1920x1200 monitor

I opened up a 20mp Olympus PEN-F raw file and used denoise (profiled) using your settings. Even with my low power MX250 the screen update only takes 1-2 seconds.


dn

Good luck. I hope you can find the solution to the problem you are having.

Should you be on default in OPENGL DT setting or would the faster graphics settings work better?I still don’t think this is the problem I have a crappy older NVIDIA card that uses OPENGL with ON1 photo raw and it seems to work fine. I think you should maybe try as @kofa says to benchmark it and see if the numbers seem okay just for the hardware as it is configured. GpuTest - Cross-Platform GPU Stress Test and OpenGL Benchmark for Windows, Linux and OS X | Geeks3D.com

Also are there any setting that could be needed in the bios. You might just want to review those to the best of your ability in case there is something weird there like the cache is disable or something stupid…just grasping at straws but you never know

EDIT…maybe see if you have a bios update??

Any chance your motherboard has onboard gpu?? If so maybe be sure it is disabled If your system benchmarks without DT involved are slow maybe review your BIOS settings one by one and if there is a bios update maybe give that a try. Hope you find the problem…must be very frustrating

NVidia provides also some tests especially for opencl

If those run fine you can at least exclude that your issue is hardware / bios / driver related.

2 Likes

Guys, I am telling you this is not a problem that just I have. The new Nvidia drivers are broken. But it is difficult to notice it if you don’t have both Linux and Windows.
What I am trying right now: I am downloading the oldest driver that is still available form Nvidia (457/451). Let’s see if that one is broken too.

I also applied some other modules since I had to create some noise to remove. I’d say 2 seconds for denoise only is quite slow.

When you use your MX250 how long does it take?

On Windows, I think I get the same performance as you 1.something seconds if only basecurve + denoise non local auto are active.
I think there is no significant difference between with and without opencl.
I am about to download msys2 so I can actually measure the performance on Windows.
On Debian, dt seems to be a bit faster, but there is no difference between opencl and no opencl either. Actually according to darktable -d perf, without opencl it is even slightly faster, I measured 0.9 seconds. With opencl seems to be 1.1 seconds.

Edit: On Windows, 2.0 seconds with opencl and 1.5 seconds without opencl (according to darktable -d perf).

I just did some performance tests with my systems, too. I do have two Linux-Systems running the same darktable versions here: an old one (I7-4700HQ with Geforce GT 750M) and a newer one (I7-7820HQ with Quadro M1200) and I can observe similar behaviours, too.
The 4700HQ system is faster with opencl disabled while the 7820HQ system is faster with opencl enabled.

I took an example image and run the export from darktable with different settings:
4700HQ-GPU-enabled: pixel pipeline processing took 46.309 secs (59.258 CPU)
4700HQ-GPU-disabled: pixel pipeline processing took 29.932 secs (222.965 CPU)
so the 4700HQ system is almost 50% faster when not using the GPU

while
7820HQ-GPU-enabled: pixel pipeline processing took 12,010 secs (20,289 CPU)
7820HQ-GPU-disabled: pixel pipeline processing took 21,378 secs (162,917 CPU)
so the 7820HQ system is almost double the speed having the GPU enabled

nevertheless denoise on all system took round about 2/3 of processing time independently whether GPU was used or not.

Looks like the relationship between CPU- and GPU-performance is very important here. If you have a fast CPU but a low to medium fast GPU enabling GPU does not help much in processing, in contrary it might even run slower.

@betazoid your CPU is just so fast that the GPU does not give you an additional boost in performance. The only chance I see is to distribute CPU/GPU power for calculating preview and full image as @MStraeten already suggested.

2 Likes

I don’t know anything about how nvidia cl stuff works on windows. Can only speak for darktable on linux.

There have been quite a number of performance gain achieved in current master, especially if you are using the release version as that uses -O3 which vectorises much better leading to a performance gain of up to 50% for some modules. That depends a lot on the cpu hardware you have but in general dt cpu modules got really faster.

For OpenCl this has not changed, some modules are very good on opencl, some are not. In general, some modules OpenCL code performance depends heavily on the graphics card memory transfer speed. The profiled denoise is the best example. So a 750M card will likely not be faster for that module than a current cpu, the quadro 1200M is faster but also on a not-so-fast memory bus.

If you have a dedicated graphics card with fast ram & 256bit wide bus, the story will be very different.

In general, if you don’t have an exceptionally fast graphics card you will be likely better off distributing load between cpu and gpu as @MStraeten suggested.

3 Likes

I noticed that there were performance optimizations in dt 3.4 and 3.6. When I bought my laptop, the stable version was 2.6 I think. At that time, almost everything was better on the GPU. I think meanwhile there are more modules that are (heavily) multithreaded if they use the CPU.
So dt 3.4 and 3.6 performance is not so terrible any more without a good GPU.
Well, time flies…
Anyway, dt 3.5 is quite fast with opencl on Linux on my new desktop PC. In general, it needs about 1/3 of the time that is needs with CPU only processing. It’s most noticeable when I add more modules.
But there is a performance problem on Windows. Yesterday, when I tested on my laptop that had an old Nvidia driver installed, the performance seemed to be ok.
The performance results seem to be different and sometimes confusing on my laptop, but one thing is sure: performance is significantly better on Linux, with and without opencl.

1 Like

You may want to try the following:

  1. stop darktable
  2. remove opencl_scheduling_profile from your .config/darktable/darktablerc
  3. restart darktable

And/or set it manually:

1 Like

Does anybody know a filter in Gimp that is suited for performance testing? Afaik Gimp does use opencl.

Have you been able to just benchmark opencl itself…not GIMP DT whatever just to be sure that your starting with a baseline idea before attempting to troubleshoot its implementation in the software…

I found this LuxMark v3 - LuxCoreRender Wiki
Looks like it would work your system and give you an idea

FOund it here

Sorry it this is not of any use…

2 Likes

So I ran Geekbench on both Windows and Linux, the results are quite similar. So apparently Opencl is not broken on Windows?

This is Linux:

And this Windows:

The Luxmark result is on Windows 17041 and 16879 and on Linux I also ran it approximatley twice but I only wrote down the result once, 16420 (the first result was also 16something).
So it really looks like there is nothing wrong with Opencl on Windows.