Hello,
After recent upgrade from Darktable 5.2 to 5.4 and 5.5 (from unofficial repos for Ubuntu 22.04) I’m facing short freezes using OpenCL bound modules. Such as exposure, raw demosaic etc.
My setup is as follows:
Linux kernel 6.8.0-90-generic
Ubuntu 22.04
2x NVIDIA CUDA Quadro K2000
> darktable -d opencl
darktable 5.5.0~git41.fa8b49d6-1+13547.1
Copyright (C) 2012-2025 Johannes Hanika and other contributors.
Compile options:
Bit depth → 64 bit
Exiv2 → 0.27.5
Lensfun → 0.3.2
Debug → DISABLED
SSE2 optimizations → ENABLED
OpenMP → ENABLED
OpenCL → ENABLED
Lua → ENABLED - API version 9.6.0
Colord → ENABLED
gPhoto2 → ENABLED
OSMGpsMap → ENABLED - map view is available
GMIC → ENABLED - Compressed LUTs are supported
GraphicsMagick → ENABLED
ImageMagick → DISABLED
libavif → DISABLED
libheif → DISABLED
libjxl → DISABLED
LibRaw → ENABLED - Version 0.22.0-PreRC1
OpenJPEG → ENABLED
OpenEXR → ENABLED
WebP → ENABLED
See resources | darktable for detailed documentation.
See Sign in to GitHub · GitHub to report bugs.
0.0001 [dt starting]
darktable -d opencl
0.2614 [dt_dlopencl_init] could not find default opencl runtime library ‘libOpenCL’
0.2615 [dt_dlopencl_init] could not find default opencl runtime library ‘libOpenCL.so’
0.2618 [opencl_init] opencl library ‘libOpenCL.so.1’ found on your system and loaded, preference ‘default path’
0.2958 [opencl_init] found 1 platform
[opencl_init] found 2 devices
[dt_opencl_device_init]
DEVICE: 0: ‘Quadro K2000’
CONF KEY: cldevice_v5_nvidiacudaquadrok2000
PLATFORM, VENDOR & ID: NVIDIA CUDA, NVIDIA Corporation, ID=4318
CANONICAL NAME: nvidiacudaquadrok2000
DRIVER VERSION: 470.256.02
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 1991 MB
MAX MEM ALLOC: 498 MB
MAX IMAGE SIZE: 16384 x 16384
MAX CONSTANT BUFFER: 64 KB
ADDRESS ALIGN: 512
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 64 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 61427584.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/alex/.cache/darktable/cached_v5_kernels_for_NVIDIACUDAQuadroK2000_47025602
CL COMPILER OPTION: -cl-fast-relaxed-math
CL COMPILER COMMAND: -w -cl-fast-relaxed-math -DNVIDIA_SM_20=1 -DNVIDIA=1 -I"/usr/share/darktable/kernels"
CL EXCEPTION: DT_OPENCL_ONLY_CUDA
KERNEL LOADING TIME: 0.0404 sec
[dt_opencl_device_init]
DEVICE: 1: ‘Quadro K2000’
CONF KEY: cldevice_v5_nvidiacudaquadrok2000
PLATFORM, VENDOR & ID: NVIDIA CUDA, NVIDIA Corporation, ID=4318
CANONICAL NAME: nvidiacudaquadrok2000
DRIVER VERSION: 470.256.02
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 2000 MB
MAX MEM ALLOC: 500 MB
MAX IMAGE SIZE: 16384 x 16384
MAX CONSTANT BUFFER: 64 KB
ADDRESS ALIGN: 512
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 64 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 61427584.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/alex/.cache/darktable/cached_v5_kernels_for_NVIDIACUDAQuadroK2000_47025602
CL COMPILER OPTION: -cl-fast-relaxed-math
CL COMPILER COMMAND: -w -cl-fast-relaxed-math -DNVIDIA_SM_20=1 -DNVIDIA=1 -I"/usr/share/darktable/kernels"
CL EXCEPTION: DT_OPENCL_ONLY_CUDA
KERNEL LOADING TIME: 0.0351 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init] 0 ‘NVIDIA CUDA Quadro K2000’
[opencl_init] 1 ‘NVIDIA CUDA Quadro K2000’
0.4540 [opencl_init] FINALLY: opencl PREFERENCE=ON is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: ‘multiple GPUs’
[opencl_init] opencl_device_priority: ‘/!0,///!0,*’
[opencl_init] opencl_mandatory_timeout: 1000
[opencl_update_priorities] these are your device priorities:
[opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[opencl_update_priorities] image preview export thumbs preview2
[opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 20
[opencl_update_priorities] these are your device priorities:
[opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[opencl_update_priorities] image preview export thumbs preview2
[opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 20
darktablerc configuration:
cat $HOME/.config/darktable/darktablerc | grep -e opencl -e nvid
cldevice_v5_nvidiacudaquadrok2000=0 250 0 16 16 128 0 0 0.000 61427584.000 0.250
cldevice_v5_nvidiacudaquadrok2000_building=-cl-fast-relaxed-math
cldevice_v5_nvidiacudaquadrok2000_id0=600
cldevice_v5_nvidiacudaquadrok2000_id1=600
clplatform_intelropenclhdgraphics=FALSE
clplatform_nvidiacuda=TRUE
clplatform_openclon12=FALSE
opencl=TRUE
opencl_checksum=1654065287
opencl_device_priority=/!0,///!0,*
opencl_library=
opencl_mandatory_timeout=1000
opencl_scheduling_profile=multiple GPUs
opencl_tune_headroom=TRUE
Settings:
darktable resources: large
Activate OpenCL support: ON
OpenCL scheduling profile: multiple GPUs
tuned GPU memory: ON
Short freeze appears along with following lines in the output
8.4097 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS
** 8.4097 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS**
** 8.4098 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS**
** 8.4099 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS**
Could you help interpret the above?
Is it a problem with GPU memory usage by OpenCL?
Is it possible to tweak by modifying some limits?
Could this be due to changes in newest DT releases?
Conflict of new DT with OpenCL packages for kernel 6.18?