I recently at pretty well the same time received updates to both my nvidia driver and to DT (Now 4.4.1). I am running Ubuntu 22.04. Since the updates DT has been running very slow having had no issues with DT 4.2 prior. Launching DT with darktable -d opencl -d perf reveals (I think) that DT is using the onboard intel graphics and not the nvidia card. On launch DT does show a pop up message about opencl devices having been renamed but I do not really understand that message. Any suggestion about how to overcome this would be much appreciated. Many Thanks.
Please post the output of clinfo -l
xxx@xxxxx : ~$ clinfo -l
Platform #0: Intel(R) OpenCL HD Graphics
-- Device #0: Intel(R) HD Graphics 530 [0x1912] Platform #1: NVIDIA CUDA – Device #0: NVIDIA GeForce GTX 1070
darktable-cltest would be more useful to understand what’s going on.
darktable-cltest
0.0247 [dt_get_sysresource_level] switched to 2 as `large’
0.0248 total mem: 15890MB
0.0248 mipmap cache: 1986MB
0.0248 available mem: 10862MB
0.0248 singlebuff: 248MB
0.0248 OpenCL tune mem: WANTED
0.0248 OpenCL pinned: WANTED
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: ‘multiple GPUs’
[opencl_init] opencl_library: ‘default path’
[opencl_init] opencl_device_priority: ‘/!0,//’
[opencl_init] opencl_mandatory_timeout: 200
[opencl_init] opencl library ‘libOpenCL’ found on your system and loaded
[opencl_init] found 2 platforms
[opencl_init] found 2 devices
[dt_opencl_device_init]
DEVICE: 0: ‘Intel(R) HD Graphics 530 [0x1912]’
PLATFORM NAME & VENDOR: Intel(R) OpenCL HD Graphics, Intel(R) Corporation
CANONICAL NAME: intelropenclhdgraphicsintelrhdgraphics5300x1912
DRIVER VERSION: 1.0.0
DEVICE VERSION: OpenCL 3.0 NEO
DEVICE_TYPE: GPU
GLOBAL MEM SIZE: 12712 MB
MAX MEM ALLOC: 4096 MB
MAX IMAGE SIZE: 16384 x 16384
MAX WORK GROUP SIZE: 256
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 256 256 256 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: WANTED
MEMORY TUNING: WANTED
FORCED HEADROOM: 400
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH: 16
ROUNDUP HEIGHT: 16
CHECK EVENT HANDLES: 128
PERFORMANCE: 2.279
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/fgs/.cache/darktable/cached_v1_kernels_for_IntelROpenCLHDGraphicsIntelRHDGraphics5300x1912_100
CL COMPILER OPTION:
KERNEL LOADING TIME: 0.0168 sec
[dt_opencl_device_init]
DEVICE: 1: ‘NVIDIA GeForce GTX 1070’
PLATFORM NAME & VENDOR: NVIDIA CUDA, NVIDIA Corporation
CANONICAL NAME: nvidiacudanvidiageforcegtx1070
DRIVER VERSION: 535.54.03
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU
GLOBAL MEM SIZE: 8092 MB
MAX MEM ALLOC: 2023 MB
MAX IMAGE SIZE: 16384 x 32768
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 64 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: WANTED
MEMORY TUNING: WANTED
FORCED HEADROOM: 400
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH: 16
ROUNDUP HEIGHT: 16
CHECK EVENT HANDLES: 128
PERFORMANCE: 16.885
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/fgs/.cache/darktable/cached_v1_kernels_for_NVIDIACUDANVIDIAGeForceGTX1070_5355403
CL COMPILER OPTION: -cl-fast-relaxed-math
KERNEL LOADING TIME: 0.0185 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init] 0 ‘Intel(R) OpenCL HD Graphics Intel(R) HD Graphics 530 [0x1912]’
[opencl_init] 1 ‘NVIDIA CUDA NVIDIA GeForce GTX 1070’
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] 1 1 1 1 1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 20
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] 1 1 1 1 1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 20
Both GPU are being used. The intel CL compiler does not have the fast relax math. Did you change that?
My recommendations would be to:
- Disable the Intel card via the darktablerc settings
- Turn off the memory tuning and transfer
- Set the profile to Very fast GPU
- If some of the modules switch to CPU path, consider increasing the mandatory timeout to 1000 instead of 200. (D&S can benefit from this).
I changed opencl_device-priority to 1,/1,/1,/1,
Relaunched DT with : darktable -d opencl -d perf
A section of the output is below :
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] 1 1 1 1 1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_synchronization_timeout] synchronization timeout set to 0
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] 1 1 1 1 1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_synchronization_timeout] synchronization timeout set to 0
8.4595 [dt_dev_load_raw] loading the image. took 0.390 secs (1.678 CPU)
9.1456 [histogram] took 0.001 secs (0.001 CPU) scope draw
9.3229 [dt_dev_process_image_job] loading image. took 0.000 secs (0.000 CPU)
9.3712 [dt_opencl_check_tuning] use 10821MB (tunemem=OFF, pinning=OFF) on device `Intel(R) OpenCL HD Graphics Intel(R) HD Graphics 530 [0x1912]’ id=0
Which still appears to show DT using the intel system and the performance was very slow.
Possibly of note. If I switch off opencl in DT settings then it loads an image faster than it does when opencl is enabled. A week ago it was blisteringly fast with opencl and the same withhout opencl.
then check your opencl scheduler profile setting - overriding the priority is just active if “default” ist selected
Probably resolved. I set priorities to ‘1,/1,/1,/1,’ and the profile setting to default.
Demosaic time came down from 5 seconds to 0.33 seconds and am now back to full speed. I do not know if it was the new nvidia update or the DT update that broke what I had before. Thanks so much for the advice that enable me to sort this.
to achieve even a bit more speed you can prioritize full pixel pipe to be processed by the gpu and the preview processed by the intel on cpu gpu. So both pipelines can make use of free resources available.
But that’s quite dependent on your system …
Maybe even prioritize 1,0,* instead of simply 1 to allow using the intel gpu if the dedicated gpu is busy
I am at work and thus do not have access to DT at present but was planning to try ‘!0,/!1,/1,/1,’ with the intel graphics = 0 and the nvidia card = 1. Of interest although demosaic / open image was back to pretty well normal speed the perspective correction module was really struggling.
Thanks for all the help. I shall update as I (Hopefully) make progress. Strange thing is that I have never had to tweak / fine tune DT like this and have been using it for quite some years.
Update :
Changed the priorities to : !0,/!1,/1,/1, - intel =0 and nvidia=1
Dramatic improvement and all now does seem to be well.