How to enable OpenCL under Ubuntu 24.04 on Framework 16 (AMD Radeon™ RX 7700S)

I’m running Ubuntu 24.04.1 LTS on a Framework Laptop 16 (AMD Ryzen™ 7040 Series with dGPU (AMD Radeon™ RX 7700S)).

I’m completely new to OpenCL and currently $ clinfo --list gives me an empty output. I’d like to get OpenCL running to use it with darktable (deb installation). I understood, that there are (at least) three ways to use OpenCL:

So what is the recommended way to enable OpenCL? Any good tutorial you can suggest as a starting point? Thx for any support!

Try the rocm first. Check online how to enable it in Ubuntu. On Fedora it is easy: sudo dnf install rocm-opencl

2 Likes

Thanks for the recommendation, @g-man! I followed the official installation manual, which worked fine.

If I understand it correct darktable-cltest recognized both, the iGPU (as device 1: gfx1103) and the dGPU (as device 0: gfx1102):

darktable-cltest output
$ darktable-cltest 
darktable 4.8.1
Copyright (C) 2012-2024 Johannes Hanika and other contributors.

Compile options:
  Bit depth              -> 64 bit
  Debug                  -> DISABLED
  SSE2 optimizations     -> ENABLED
  OpenMP                 -> ENABLED
  OpenCL                 -> ENABLED
  Lua                    -> ENABLED  - API version 9.3.0
  Colord                 -> ENABLED
  gPhoto2                -> ENABLED
  GMIC                   -> ENABLED  - Compressed LUTs are supported
  GraphicsMagick         -> ENABLED
  ImageMagick            -> DISABLED
  libavif                -> DISABLED
  libheif                -> ENABLED
  libjxl                 -> ENABLED
  OpenJPEG               -> ENABLED
  OpenEXR                -> ENABLED
  WebP                   -> ENABLED

See https://www.darktable.org/resources/ for detailed documentation.
See https://github.com/darktable-org/darktable/issues/new/choose to report bugs.

     0,0226 [dt_get_sysresource_level] switched to 2 as `large'
     0,0226   total mem:       88298MB
     0,0226   mipmap cache:    11037MB
     0,0226   available mem:   60360MB
     0,0227   singlebuff:      1379MB
     0.0251 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL'
     0.0252 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL.so'
     0.0252 [opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded, preference 'default path'
     2.1670 [opencl_init] found 1 platform
[opencl_init] found 2 devices

[dt_opencl_device_init]
   DEVICE:                   0: 'gfx1102'
   CONF KEY:                 cldevice_v5_amdacceleratedparallelprocessinggfx1102
   PLATFORM, VENDOR & ID:    AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
   CANONICAL NAME:           amdacceleratedparallelprocessinggfx1102
   DRIVER VERSION:           3635.0 (HSA1.1,LC)
   DEVICE VERSION:           OpenCL 2.0 
   DEVICE_TYPE:              GPU, dedicated mem
   GLOBAL MEM SIZE:          8176 MB
   MAX MEM ALLOC:            6950 MB
   MAX IMAGE SIZE:           16384 x 16384
   MAX WORK GROUP SIZE:      256
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   USE HEADROOM:             600Mb
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH & HEIGHT    16x16
   CHECK EVENT HANDLES:      128
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /usr/share/darktable/kernels
   KERNEL DIRECTORY:         /home/simon/.cache/darktable/cached_v3_kernels_for_AMDAcceleratedParallelProcessinggfx1102_36350HSA11LC
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   CL COMPILER COMMAND:      -w -cl-fast-relaxed-math  -DAMD=1 -I"/usr/share/darktable/kernels"
   KERNEL LOADING TIME:       0.0171 sec

[dt_opencl_device_init]
   DEVICE:                   1: 'gfx1103'
   CONF KEY:                 cldevice_v5_amdacceleratedparallelprocessinggfx1103
   PLATFORM, VENDOR & ID:    AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
   CANONICAL NAME:           amdacceleratedparallelprocessinggfx1103
   DRIVER VERSION:           3635.0 (HSA1.1,LC)
   DEVICE VERSION:           OpenCL 2.0 
   DEVICE_TYPE:              GPU, unified mem
   GLOBAL MEM SIZE:          44149 MB
   MAX MEM ALLOC:            37527 MB
   MAX IMAGE SIZE:           16384 x 16384
   MAX WORK GROUP SIZE:      256
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   USE HEADROOM:             600Mb
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH & HEIGHT    16x16
   CHECK EVENT HANDLES:      128
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /usr/share/darktable/kernels
   KERNEL DIRECTORY:         /home/simon/.cache/darktable/cached_v3_kernels_for_AMDAcceleratedParallelProcessinggfx1103_36350HSA11LC
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   CL COMPILER COMMAND:      -w -cl-fast-relaxed-math  -DAMD=1 -I"/usr/share/darktable/kernels"
   KERNEL LOADING TIME:       0.0151 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init]		0	'AMD Accelerated Parallel Processing gfx1102'
[opencl_init]		1	'AMD Accelerated Parallel Processing gfx1103'
     2.5902 [opencl_init] FINALLY: opencl PREFERENCE=ON is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	1	0	0	1
[dt_opencl_update_priorities]		1	-1	1	1	-1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[opencl_synchronization_timeout] synchronization timeout set to 200
   UNIFIED MEM SIZE:         22075 MB reserved for 'amdacceleratedparallelprocessinggfx1103'
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	1	0	0	1
[dt_opencl_update_priorities]		1	-1	1	1	-1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[opencl_synchronization_timeout] synchronization timeout set to 200

Do I have to tell darktable somehow which GPU to prefer (couldn’t find an option in the settings)?

If you edit darktablerc, you can control which GPU to use.

But, with your system setup, I recommend that you disable the iGPU in the darktablerc and set your system to use Very fast gpu instead of default.

1 Like

1 Like

If I understand the manual correct, the darktablerc-line to disable the iGPU and use the dGPU for everything should read:
opencl_device_priority=0/0/0/0/0,0
Correct?

I’m wondering if the AMD Ryzen™ 9 7940HS is really so slow, that waiting for/only using the dGPU is faster instead than using CPU + dGPU in parallel.

Dt process most modules in sequence since the output of one feeds the next module, so there is really not much parallel within a single processing pipe. Telling it to use the dGPU normally yields faster processing because there are more processing units in a GPU.

The area that can help parallel is the preview vs the full pipe. One can execute in parallel. But with a iGPU, it is using the same memory and processing as the CPU. Something has to wait if you funnel too many things thru the CPU.

You can use -d perf to evaluate how your system behaves when it exports or when you do a change in the darkroom (ideally at the start of the pipe).

1 Like

I might be mistaken, but to my understanding with above configuration (very fast GPU & forbidding iGPU) the dGPU is the only device left over for darktable to use. So I was wondering if a configuration similar to
opencl_device_priority=0/*/0/*/*,* (to allow offloading to the iGPU for everything except center image and export pipeline) and/or not using very fast GPU config (to allow offloading to the CPU) wouldn’t make more sense.

Memory shouldn’t be an issue here, having 96 GB of it and if I recall correctly 8 GB assigned to the iGPU in BIOS. iGPU and CPU using the same processing sounds counter-intutive to me (what would the iGPU consist of then?), but I’m no expert on that field. :slight_smile:

Just test it and see what works for you.

So I tried 10 different combinations of opencl_device_priority and opencl_scheduling_profile (default, very fast GPU) and the results were very close to each other, differences normally <0.01 secs on process_image and ~0.1 secs on process_export.
Only two exceptions were when I forced to use the iGPU (priority: 1/1/1/1/1) & profile default (export time more than tripled, process image time more than doubled) and when I used */!0,*/*/*/!0,* with profile default (image process time more than doubled).

Note: Probably the numbers are not trustworthy anyway as I did not

  • repeat the runs
  • use the second preview window
  • pay attention to cache
  • process and export more than one image each run

Use parameter h. to disable the each GPU. You will get more accurate results this way.

https://docs.darktable.org/usermanual/4.6/en/special-topics/mem-performance/#device-specific-opencl-configuration

I tired some more, numbers stay very similar if I allow the iGPU for additional tasks or not and are still a little fluid. I now settled on

cldevice_v5_amdacceleratedparallelprocessinggfx1102=0 0 0 16 16 1024 1 0 0.000 0.000 0.250
cldevice_v5_amdacceleratedparallelprocessinggfx1102_building=-cl-fast-relaxed-math
cldevice_v5_amdacceleratedparallelprocessinggfx1102_id0=0
cldevice_v5_amdacceleratedparallelprocessinggfx1103=0 250 0 16 16 128 0 0 0.000 0.000 0.250
cldevice_v5_amdacceleratedparallelprocessinggfx1103_building=-cl-fast-relaxed-math
cldevice_v5_amdacceleratedparallelprocessinggfx1103_id0=600
cldevice_v5_amdacceleratedparallelprocessinggfx1103_id1=600
opencl_device_priority=+0/*/+0/*/+0
opencl_scheduling_profile=very fast GPU

and decided to spend time rather on image editing than on further performance tuning. :slight_smile: