Problem exporting

Linux, optimised build from the master branch, Ryzen 5 5600, 64 GB RAM, NVidia 1060 / 6 GB:

With OpenCL:

63.9143 [dev_process_export] pixel pipeline processing took 5.796 secs (5.867 CPU)
...
 [opencl_summary_statistics] device 'NVIDIA CUDA NVIDIA GeForce GTX 1060 6GB' (0): peak memory usage 4422928000 bytes (4218.0 MB)

Without OpenCL:

49.0822 [dev_process_export] pixel pipeline processing took 16.135 secs (168.376 CPU)

Windows laptop, one of the pre-built binaries (either a ‘weekly’ build or the nightly snapshot from GitLab, I don’t remember). i5-10210U, 16 GB RAM, no dedicated GPU (only the ‘Intel UHD Graphics’ on the CPU). Note that the Linux / NVidia OpenCL numbers above were achieved with tuned settings (all operations on the GPU, lots of memory dedicated the darktable’s processing), while those below use default settings for everything, as I don’t really use darktable on the laptop.

With OpenCL (clearly not worth it):

328.2157 [dev_process_export] pixel pipeline processing took 153.791 secs (130.266 CPU)

Without OpenCL:

116.7420 [dev_process_export] pixel pipeline processing took 47.722 secs (312.781 CPU)

The large radius used in diffuse or sharpen means that if tiling (darktable’s solution for low memory situations) kicks in, a lot of calculations are repeated over and over. In such situations, the best performance is achieved by reducing tiling (using more memory), so if the GPU is starved for memory, but the CPU is not, the latter will win.