I would like to kindly ask you for help with issues I’m observing on my new PC build. I decided to build the PC specifically for Darktable and I’m trying to find the best GPU for OpenCL within my budget.
Software-wise, my setup is: Linux Mint 20, Darktable 3.2.1
At first I bought an AMD RX 570 8GB and got some pretty decent results when compared with processing on CPU only. I have used the open source AMD drivers with the necessary OpenCL part installed from the AMDGPU-PRO package.
Just few days after byuing the AMD card I found Nvidia GTX 1650 Super on sale. Overall, the 1650 Super shall be better performing GPU, although having only 4GB of VRAM instead of 8GB - but from profiling Darktable exports with the AMD GPU, the VRAM consumption was moving between few hundred of MB up to 2.4GB, therefore I thought 4GB VRAM will be ok. So I decided to buy that one as well, keep the better performing one and return the other one.
I made some basic setup to compare the cards - I selected 30 edited photos and used darktable-cli to export them while logging the debug and profiling info, also watching the GPU resources and measuring the total duration of the whole operation.
With the Nvidia GPU, I installed the proprietary nvidia-driver-440. The OpenCL support worked out of the box. However, the results of my tests surprised me - in my experiments, the same set of exports took approx. 30% longer than the older and slower AMD card. Also, from the log I can see that the Nvidia only allows VRAM allocations only up to 977MB. And, from the benchmarks that can be found here (GPU benchmarks in darktable), it seems that Nvidia GPUs often allow only approx. 1/4 of their VRAM for OpenCL memory allocations.
Has anyone here observed such a low performance with Nvidia GPUs with Darktable on Linux? Would you have some recommendations on how to improve it, maybe also how to trick it to allow it to allocate more memory?
Below are the initial parts of the Darktable logs:
[opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded [opencl_init] found 1 platform [opencl_init] found 1 device [opencl_init] device 0 `Ellesmere' supports image sizes of 16384 x 16384 [opencl_init] device 0 `Ellesmere' allows GPU memory allocations of up to 6565MB [opencl_init] device 0: Ellesmere GLOBAL_MEM_SIZE: 7950MB MAX_WORK_GROUP_SIZE: 256 MAX_WORK_ITEM_DIMENSIONS: 3 MAX_WORK_ITEM_SIZES: [ 1024 1024 1024 ] DRIVER_VERSION: 3143.9 DEVICE_VERSION: OpenCL 1.2 AMD-APP (3143.9) [opencl_init] options for OpenCL compiler: -w -DAMD=1 -I"/usr/share/darktable/kernels"
[opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded [opencl_init] found 1 platform [opencl_init] found 1 device [opencl_init] device 0 `GeForce GTX 1650 SUPER' has sm_20 support. [opencl_init] device 0 `GeForce GTX 1650 SUPER' supports image sizes of 32768 x 32768 [opencl_init] device 0 `GeForce GTX 1650 SUPER' allows GPU memory allocations of up to 977MB [opencl_init] device 0: GeForce GTX 1650 SUPER GLOBAL_MEM_SIZE: 3909MB MAX_WORK_GROUP_SIZE: 1024 MAX_WORK_ITEM_DIMENSIONS: 3 MAX_WORK_ITEM_SIZES: [ 1024 1024 64 ] DRIVER_VERSION: 440.100 DEVICE_VERSION: OpenCL 1.2 CUDA [opencl_init] options for OpenCL compiler: -w -DNVIDIA_SM_20=1 -DNVIDIA=1 -I"/usr/share/darktable/kernels"