Status of OpenCL on M1 Macs

I realize the community is mostly Linux users but hoping someone with better understanding of Macs, M1, and GPUs can clear up the Status of OpenCL for darktable on a Mac M1. Thanks in advance for any help. If there is a real benefit to compiling myself I would be investigate it further.

I just started using darktable a few months back. Unfortunately, shortly after replacing my old damaged MacBook with a new M1 MacBook.

Based on some research, it appears with the creation of Metal Apple has abandoned OpenCL, and no longer provides a driver/interface to their GPUs required for OpenCL app to leverage GPUs.

But for M1 Apple has provided Rosetta2, a translator to convert Intel binaries to M1/ARM binaries. But I’m pretty sure it does not fill the missing OpenCL gap above.

Anyway the default prebuilt binaries seem to work fine on my M1 MacBook, based on the Rosetta2 translation. I assumed the M1 CPU is fast enough that it was OK without benefits of a using the GPU.

So reading this bug report I am now confused (https://github.com/darktable-org/darktable/issues/12234). The cli output from darktable -d opencl seems to indicate that dt does have and recognize OpenCL? I have spent time trying to understand but am still puzzled.

I have found a whole bunch of files on my M1 at /System/Library/Frameworks/OpenCL.framework/Versions/Current/OpenCL
Which even if not the “driver” would seem to indicate Apple is still supporting OpenCL?

My debug output is somewhat different. It seems to indicate OpenCL is present but not working???

/Applications/darktable.app/Contents/MacOS/darktable -d opencl

(process:5497): GLib-GObject-CRITICAL **: 21:28:44.445: g_object_set: assertion ‘G_IS_OBJECT (object)’ failed
[dt_pthread_create] info: bumping pthread’s stacksize from 524288 to 2097152
[dt_pthread_create] info: bumping pthread’s stacksize from 524288 to 2097152
[dt_pthread_create] info: bumping pthread’s stacksize from 524288 to 2097152
[dt_pthread_create] info: bumping pthread’s stacksize from 524288 to 2097152
[dt_pthread_create] info: bumping pthread’s stacksize from 524288 to 2097152
[dt_pthread_create] info: bumping pthread’s stacksize from 524288 to 2097152
[dt_pthread_create] info: bumping pthread’s stacksize from 524288 to 2097152
[dt_pthread_create] info: bumping pthread’s stacksize from 524288 to 2097152
[dt_pthread_create] info: bumping pthread’s stacksize from 524288 to 2097152
[opencl_init] opencl related configuration options:
[opencl_init] opencl: OFF
[opencl_init] opencl_scheduling_profile: ‘default’
[opencl_init] opencl_library: ‘default path’
[opencl_init] opencl_device_priority: ‘/!0,///!0,*’
[opencl_init] opencl_mandatory_timeout: 400
[opencl_init] opencl_synch_cache: active module
[opencl_init] opencl library ‘/System/Library/Frameworks/OpenCL.framework/Versions/Current/OpenCL’ found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 2 devices

[dt_opencl_device_init]
DEVICE: 0: ‘Apple M1 Max’
CANONICAL NAME: applem1max
PLATFORM NAME & VENDOR: Apple, Apple
DRIVER VERSION: 1.1
DEVICE VERSION: OpenCL 1.2
DEVICE_TYPE: CPU
GLOBAL MEM SIZE: 65536 MB
MAX MEM ALLOC: 16384 MB
MAX IMAGE SIZE: 8192 x 8192
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1 1 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
MEMORY TUNING: NO
FORCED HEADROOM: 400
AVOID ATOMICS: NO
MICRO NAP: 1000
ROUNDUP WIDTH: 16
ROUNDUP HEIGHT: 16
CHECK EVENT HANDLES: 128
PERFORMANCE: 0.000000 (CPU 0.080407)
DEFAULT DEVICE: NO
*** marked as disabled ***

[dt_opencl_device_init]
DEVICE: 1: ‘Apple M1 Max’
CANONICAL NAME: applem1max
PLATFORM NAME & VENDOR: Apple, Apple
DRIVER VERSION: 1.2 1.0
DEVICE VERSION: OpenCL 1.2
DEVICE_TYPE: GPU
GLOBAL MEM SIZE: 49152 MB
MAX MEM ALLOC: 9216 MB
MAX IMAGE SIZE: 16384 x 16384
MAX WORK GROUP SIZE: 256
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 256 256 256 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
MEMORY TUNING: NO
FORCED HEADROOM: 400
AVOID ATOMICS: NO
MICRO NAP: 1000
ROUNDUP WIDTH: 16
ROUNDUP HEIGHT: 16
CHECK EVENT HANDLES: 128
PERFORMANCE: 0.000000 (CPU 0.080407)
DEFAULT DEVICE: NO
*** marked as disabled ***
[opencl_init] no suitable devices found.
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.

(darktable:5497): GLib-GObject-WARNING **: 21:28:44.863: invalid cast from ‘GtkMenuBar’ to ‘GtkWindow’

(darktable:5497): Gtk-CRITICAL **: 21:28:44.863: gtk_window_add_accel_group: assertion ‘GTK_IS_WINDOW (window)’ failed

Perhaps the software-based implementation has very bad performance:

PERFORMANCE: 0.000000 (CPU 0.080407)

My OpenCL performance measurement on an NVidia 1060 reports:

PERFORMANCE:              1.209715

Thanks for your input. <10% of is significant. I think I remember reading somewhere that dt does a test where if the CPU is faster it just uses it.

it seems the performance test done by dt aren’t usable for m1. So you might override the results of dt estimation and explicitly enable opencl yourself in your ~/.config/darktable/darktablerc to see if that results in an increased performance.

see:darktable 4.0 user manual - memory & performance tuning

h. disable device
0 = enable device; 1 = disable device
If darktable detects a malfunctioning device it will automatically mark it as such by setting this parameter to 1. If you have a device that reports a lot of errors you can manually disable it by setting this field to 1.

and

opencl=TRUE

Can you find a cldevice_ line in your darktablerc? Mine looks like:

cldevice_v4_nvidiageforcegtx10606gb=0 250 0 64 64 1024 1 0 0.019865

The last number is the performance indicator; the one before that is the disabled flag (0 means enabled).

You may want to try editing that line, making sure the disabled flag is 0, and maybe fake a performance indicator just to check out what happens if you force OpenCL.

The CPU also has a benchmark indicator, mine is:

dt_cpubenchmark=0.024030983448028564

The performance gain is calculated as tcpu / tgpu (so, for me, 0.024030983448028564/0.019865 = 1.209715, as shown above). You seem to be using a somewhat older version, which directly reports the two figures.

https://github.com/darktable-org/darktable/blob/2b2425086db4e6fa735dc21912fc8f910a425add/src/common/opencl.c#L321-L328

You may also need to set this in your config file:

opencl_disable_drivers_blacklist=true

I have only used dt 3.8 and 4.0, not building yet. The fact that [opencl_library = blank ] seems like a bad sign. Not sure what an Intel Mac would have?

opencl=FALSE
opencl_async_pixelpipe=false
opencl_avoid_atomics=false
opencl_checksum=1382641197
opencl_device_priority=/!0,///!0,*
opencl_library=
opencl_mandatory_timeout=400
opencl_memory_headroom=400
opencl_memory_requirement=768
opencl_micro_nap=1000
opencl_number_event_handles=25
opencl_scheduling_profile=default
opencl_size_roundup=16
opencl_synch_cache=active module
opencl_tuning_mode=nothing
opencl_use_cpu_devices=false
opencl_use_pinned_memory=false

Those are conf values from 3.8

Look for the device specific conf data as described in the manual.

No, that’s completely normal:

OpenCL runtime library is normally detected automatically by darktable. if your OpenCL runtime is at an unusual place and cannot be detected, enter the full pathname here. leave empty for default behavior.