Darktable short freezes "could not allocate oversize buffer on device"

Hello,
After recent upgrade from Darktable 5.2 to 5.4 and 5.5 (from unofficial repos for Ubuntu 22.04) I’m facing short freezes using OpenCL bound modules. Such as exposure, raw demosaic etc.
My setup is as follows:
Linux kernel 6.8.0-90-generic
Ubuntu 22.04
2x NVIDIA CUDA Quadro K2000

> darktable -d opencl
darktable 5.5.0~git41.fa8b49d6-1+13547.1
Copyright (C) 2012-2025 Johannes Hanika and other contributors.

Compile options:
Bit depth → 64 bit
Exiv2 → 0.27.5
Lensfun → 0.3.2
Debug → DISABLED
SSE2 optimizations → ENABLED
OpenMP → ENABLED
OpenCL → ENABLED
Lua → ENABLED - API version 9.6.0
Colord → ENABLED
gPhoto2 → ENABLED
OSMGpsMap → ENABLED - map view is available
GMIC → ENABLED - Compressed LUTs are supported
GraphicsMagick → ENABLED
ImageMagick → DISABLED
libavif → DISABLED
libheif → DISABLED
libjxl → DISABLED
LibRaw → ENABLED - Version 0.22.0-PreRC1
OpenJPEG → ENABLED
OpenEXR → ENABLED
WebP → ENABLED

See resources | darktable for detailed documentation.
See Sign in to GitHub · GitHub to report bugs.

 0.0001 [dt starting]

darktable -d opencl
0.2614 [dt_dlopencl_init] could not find default opencl runtime library ‘libOpenCL’
0.2615 [dt_dlopencl_init] could not find default opencl runtime library ‘libOpenCL.so’
0.2618 [opencl_init] opencl library ‘libOpenCL.so.1’ found on your system and loaded, preference ‘default path’
0.2958 [opencl_init] found 1 platform
[opencl_init] found 2 devices

[dt_opencl_device_init]
DEVICE: 0: ‘Quadro K2000’
CONF KEY: cldevice_v5_nvidiacudaquadrok2000
PLATFORM, VENDOR & ID: NVIDIA CUDA, NVIDIA Corporation, ID=4318
CANONICAL NAME: nvidiacudaquadrok2000
DRIVER VERSION: 470.256.02
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 1991 MB
MAX MEM ALLOC: 498 MB
MAX IMAGE SIZE: 16384 x 16384
MAX CONSTANT BUFFER: 64 KB
ADDRESS ALIGN: 512
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 64 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 61427584.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/alex/.cache/darktable/cached_v5_kernels_for_NVIDIACUDAQuadroK2000_47025602
CL COMPILER OPTION: -cl-fast-relaxed-math
CL COMPILER COMMAND: -w -cl-fast-relaxed-math -DNVIDIA_SM_20=1 -DNVIDIA=1 -I"/usr/share/darktable/kernels"
CL EXCEPTION: DT_OPENCL_ONLY_CUDA
KERNEL LOADING TIME: 0.0404 sec

[dt_opencl_device_init]
DEVICE: 1: ‘Quadro K2000’
CONF KEY: cldevice_v5_nvidiacudaquadrok2000
PLATFORM, VENDOR & ID: NVIDIA CUDA, NVIDIA Corporation, ID=4318
CANONICAL NAME: nvidiacudaquadrok2000
DRIVER VERSION: 470.256.02
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 2000 MB
MAX MEM ALLOC: 500 MB
MAX IMAGE SIZE: 16384 x 16384
MAX CONSTANT BUFFER: 64 KB
ADDRESS ALIGN: 512
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 64 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 61427584.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/alex/.cache/darktable/cached_v5_kernels_for_NVIDIACUDAQuadroK2000_47025602
CL COMPILER OPTION: -cl-fast-relaxed-math
CL COMPILER COMMAND: -w -cl-fast-relaxed-math -DNVIDIA_SM_20=1 -DNVIDIA=1 -I"/usr/share/darktable/kernels"
CL EXCEPTION: DT_OPENCL_ONLY_CUDA
KERNEL LOADING TIME: 0.0351 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init] 0 ‘NVIDIA CUDA Quadro K2000’
[opencl_init] 1 ‘NVIDIA CUDA Quadro K2000’
0.4540 [opencl_init] FINALLY: opencl PREFERENCE=ON is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: ‘multiple GPUs’
[opencl_init] opencl_device_priority: ‘/!0,///!0,*’
[opencl_init] opencl_mandatory_timeout: 1000
[opencl_update_priorities] these are your device priorities:
[opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[opencl_update_priorities] image preview export thumbs preview2
[opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 20
[opencl_update_priorities] these are your device priorities:
[opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[opencl_update_priorities] image preview export thumbs preview2
[opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 20

darktablerc configuration:
cat $HOME/.config/darktable/darktablerc | grep -e opencl -e nvid
cldevice_v5_nvidiacudaquadrok2000=0 250 0 16 16 128 0 0 0.000 61427584.000 0.250
cldevice_v5_nvidiacudaquadrok2000_building=-cl-fast-relaxed-math
cldevice_v5_nvidiacudaquadrok2000_id0=600
cldevice_v5_nvidiacudaquadrok2000_id1=600
clplatform_intelropenclhdgraphics=FALSE
clplatform_nvidiacuda=TRUE
clplatform_openclon12=FALSE
opencl=TRUE
opencl_checksum=1654065287
opencl_device_priority=/!0,///!0,*
opencl_library=
opencl_mandatory_timeout=1000
opencl_scheduling_profile=multiple GPUs
opencl_tune_headroom=TRUE

Settings:
darktable resources: large
Activate OpenCL support: ON
OpenCL scheduling profile: multiple GPUs
tuned GPU memory: ON

Short freeze appears along with following lines in the output
8.4097 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS
** 8.4097 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS**
** 8.4098 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS**
** 8.4099 [opencl copy_host_to_device_constant] could not allocate oversize buffer on device ‘NVIDIA CUDA Quadro K2000’ id=0: CL_SUCCESS**

Could you help interpret the above?
Is it a problem with GPU memory usage by OpenCL?
Is it possible to tweak by modifying some limits?
Could this be due to changes in newest DT releases?
Conflict of new DT with OpenCL packages for kernel 6.18?

Your card just doesn’t have enough vRAM to be useful. Its 12 years old and has 2gb vram. If you’re also driving your display from this card, you probably have more like 1.2gb vram for darktable.

2 Likes

This is true.
However, there were no such freezes with DT 5.2.
Do you think, starting with version 5.4+, DT requires more resources or it was due to some changes in OpenCL headroom approach?
No problem to roll back, just need to know how to deal with this.

I would guess headroom, but its just a guess. I think we have recommended at least 4GB of vram for some time.

1 Like

Do you have one GPU or more?

1 Like

Multiple GPUs.
2x NVIDIA CUDA Quadro K2000

2 Likes

Two identical cards? What happens if disable one?

I’m getting the same error reported on a new GPU (5060Ti with 8GB VRAM) but no noticeable freezes. I raised an issue: OpenCL "could not allocate oversize buffer on device" · Issue #20050 · darktable-org/darktable · GitHub

This exact error message was changed in a very recent PR (Fix highlights opposed OpenCL by jenshannoschwalm · Pull Request #19788 · darktable-org/darktable · GitHub)

1 Like

Feedback on my ticket is that the error message is not a problem and can be ignored.

I will report back when I install DT 5.2
Good that you have reported that. I was hesitating as my cards are too old and setup is, to say the least, peculiar. It’s surprising that yours is throwing this error. As your GPU is much more powerful, the freeze may go unnoticed.

Why? To test if it’s a problem of dual head setup?
Then the answer is - most likely, no. DT utilizes multiple GPUs just fine. Of course, this may be an issue with new versions, but it has never been before. So, I doubt.

See the comments in the issue I linked above. The message is not an “error”.

Well, my main concern is not just the line in stdout.
I don’t care if it’s there or not.
My problem is freezes that occur when this line appears.
I have taken courage and submitted my own GitHub Issue to DT devs.
It is important to understand nature of the issue before taking action - buy new GPUs, downgrade DT version or change configuration.
It’s easy to say go buy new GPU, but, as long as the root cause is unknown, it’s possible the issue may stay, while money and time would be spent.

I was just trying to start troubleshooting. I see your GitHub Issue but no log. Please provide a log using -d perf and -d opencl when a freeze happens.

I dont recall significant change to opencl code in dt from 5.2 to 5.4

1 Like

I will provide the log on GitHub. Thanks!

Some things i would like to add or correct:

  1. Whenever you want to investigate performance you definitely want to add -d pipe, it will give you informations about fallbacks to CPU or other stuff that is happening like internals about scaling, extra code being processed … If someone wants me to investigate that would be a requirement :slight_smile:
  2. There have been quite some changes to the whole demosaic code section in addition to capture sharpen, we have internal tiling, take care of details threshold while tiling and more. For cards >= 4GB there will be a significant performance gain if we have to tile as the new code is much faster but we definitely need graphics memory :slight_smile:
  3. This new demosaicer code might over-estimate mem requirements, to check such situations i definitely need -d pipe so if there are problems in that code i can check and possibly fix.
  4. For very small graphics cards there are definitely more problems. The OS needs more CL mem than before. Multiple cards or setting to “large” resources both don’t help at all. So we have fallbacks - either by OpenCL code possibly returning kernel errors at runtime or by the module code checking for resources.

If you open an issue on github - you will have to provide exact information, that place is not a generic discussion/support forum :slight_smile:

There is only one advice i can give, get another card :slight_smile: 8GB will be perfect with almost no tiling, minimum is 4GB