OK, I’ve done some performance analysis:
TL;DR: Setting OpenCL to “default” is a safe bet. And things didn’t go as planned.
Now, details:
I ran a benchmark on two computers: One Windows PC with an i5-4690K with an Nvidia GTX 1080 and 32 GB of memory, and one Linux PC with an i7-8809G with an AMD Radeon RX Vega M GH and 32 GB of memory. These should have roughly similar single-thread performance, but the Linux PC supports Hyperthreading and the Windows PC does not.
The benchmark used a 24 MP Fuji file with almost no adjustments, only
- raw black/white point
- white balance
- highlight reconstruction
- demosaic
- orientation
- exposure
- input color profile
- base curve
- output color profile
And I captured the output of darktable -d perf when moving the exposure slider one tick (one mouse wheel tick, repeated a few times until timings settled. Given timings are typical timings, usually plus or minus 10%).
Here are the various OpenCL modes:
- Windows/OpenCL Multiple GPUs: 1.192 s
- Linux/OpenCL Multiple GPUs: 0.287 s
- Windows/OpenCL Very Fast GPU: 1.130 s
- Linux/OpenCL Very Fast GPU: 0.343 s
- Windows/OpenCL Default: 0.591 s
- Linux/OpenCL Default: 0.275 s
- Windows/OpenCL deactivated: 0.416 s
- Linux/OpenCL deactivated: 0.288 s
So, clearly, the Windows machine is just much slower. The OpenCL profile really has no influence on the Linux/AMD box, but a huge one on Windows/NVidia one. If in doubt, OpenCL default is probably fastest.
Next, I repeated the experiment without the Base Curve, but with Filmic, Color Balance, Contrast Equalizer, and Lens Correction additionally activated. That’s a more typical scenario for me:
- Windows/OpenCL Multiple GPUs: 0.648 s
- Linux/OpenCL Multiple GPUs: 0.570 s
- Windows/OpenCL Very Fast GPU: 0.684 s
- Linux/OpenCL Very Fast GPU: 0.578 s
- Windows/OpenCL Default: 0.680 s
- Linux/OpenCL Default: 0.456 s
- Windows/OpenCL deactivated: 0.912 s
- Linux/OpenCL deactivated: 0.703 s
Which is a bad thing, as apparently deactivating and reactivating OpenCL on Windows doubled my performance figures… I even went back to just the Base Curve to verify, and indeed my performance had doubled. Now the Windows box behaved roughly like the Linux box, just a bit slower. With more modules, OpenCL does indeed make a difference (and the default profile is still a safe bet).
Let’s do it one more time, but this time add a profiled denoise, because that’s supposed to be a worst case scenario:
- Windows/OpenCL Multiple GPUs: 2.224 s
- Linux/OpenCL Multiple GPUs: 0.582 s
- Windows/OpenCL Very Fast GPU: 2.478 s
- Linux/OpenCL Very Fast GPU: 0.497 s
- Windows/OpenCL Default: 1.527 s
- Linux/OpenCL Default: 0.482 s
- Windows/OpenCL deactivated: 0.939 s
- Linux/OpenCL deactivated: 0.690 s
This time, the Windows box profits massively from OpenCL (at least on the default profile), and the Linux box still kind of doesn’t care one way or another.
Going back to the original question about Darktable’s performance not being real-time. It is OK fast on the Linux box it seems, and the bulk of my problem was actually with some intermittent problem on Windows.