macOS OpenCL changes in darktable 4.5.0

Hi! I’ve been using the macOS development builds for the 4.5.0 release provided by @MStraeten. (Thanks for providing builds!) I’m currently on the most recent darktable-4.5.0+893~gac2b7321d4_arm64 build.

I’ve noticed a considerable slowdown on export running the ‘local contrast’ preset of the ‘diffuse and sharpen’ module compared to the current 4.4.2 release. This appears to only occur on export. When I’m in the darkroom view, I can toggle the module, and see the effect in the preview (even at 100% zoom).

To test this, I created as minimal of a edit as I could, and then the same stack with the addition of the ‘local contrast’ preset for ‘diffuse and sharpen’.

Enabling the “use all device memory” option results in a successful export, but with an execution time around the same as without openCL.

Curious if there are any other debugging steps I should take, or if there are configuration options for the 4.4.2->4.5.0 upgrade I need to modify. Any help is greatly appreciated. Thanks!

System Info

  • MacBook Air, M2 2022
  • 8GB Memory
  • macOS Ventura 13.5.1 (22G90)
  • darktable 4.5.0 --version: darktable 4.5.0+893~gac2b7321d4-dirty
  • darktable 4.4.2 --version: this is darktable-cli 4.4.2

Results

╔═════════╤════════╤═════════╤═════╤═══════════════╤═════════════════╗
║ version │ opencl │ all mem │ d/s │ d/s time      │ total time      ║
╠═════════╪════════╪═════════╪═════╪═══════════════╪═════════════════╣
║ 4.4.2   │ y      │ n       │ y   │ 18.215        │ 19.92           ║
╟─────────┼────────┼─────────┼─────┼───────────────┼─────────────────╢
║ 4.4.2   │ n      │ n       │ y   │ 93.379        │ 94.84           ║
╟─────────┼────────┼─────────┼─────┼───────────────┴─────────────────╢
║ 4.5.0   │ y      │ n       │ y   │ manually terminated after ~8min ║
╟─────────┼────────┼─────────┼─────┼───────────────┬─────────────────╢
║ 4.5.0   │ n      │ n       │ y   │ 90.943        │ 92.40           ║
╠═════════╪════════╪═════════╪═════╪═══════════════╪═════════════════╣
║ 4.5.0   │ y      │ y       │ y   │ 91.396        │ 93.18           ║
╟─────────┼────────┼─────────┼─────┼───────────────┼─────────────────╢
║ 4.5.0   │ n      │ y       │ y   │ 91.380        │ 92.85           ║
╠═════════╪════════╪═════════╪═════╪═══════════════╪═════════════════╣
║ 4.4.2   │ y      │ n       │ n   │ n/a           │ 2.69            ║
╟─────────┼────────┼─────────┼─────┼───────────────┼─────────────────╢
║ 4.4.2   │ n      │ n       │ n   │ n/a           │ 1.30            ║
╟─────────┼────────┼─────────┼─────┼───────────────┼─────────────────╢
║ 4.5.0   │ y      │ y       │ n   │ n/a           │ 2.00            ║
╟─────────┼────────┼─────────┼─────┼───────────────┼─────────────────╢
║ 4.5.0   │ n      │ y       │ n   │ n/a           │ 1.47            ║
╟─────────┼────────┼─────────┼─────┼───────────────┼─────────────────╢
║ 4.5.0   │ y      │ n       │ n   │ n/a           │ 2.05            ║
╟─────────┼────────┼─────────┼─────┼───────────────┼─────────────────╢
║ 4.5.0   │ n      │ n       │ n   │ n/a           │ 1.32            ║
╚═════════╧════════╧═════════╧═════╧═══════════════╧═════════════════╝

I used commands similar to the following:

/Applications/darktable-4.5.0+893.app/Contents/MacOS/darktable-cli \
     Moonrise.ARW \
     Moonrise_localcontrast.ARW.xmp \
     ./output.jpg \
     --core \
     --disable-opencl \
     -d perf > darktable-4.5.0-locont-disabled.txt

Where I toggled --disable-opencl, the memory option in UI, and the input xml file for the different rows. All timings above are from the output files.

Files

Tested using image from this PlayRaw. Licensed by @akgt94 as Creative Commons, By-Attribution-Non Commercial, Share-Alike

Moonrise_localcontrast.ARW.xmp (6.5 KB)
Moonrise_none.ARW.xmp (6.0 KB)
DSC08889.ARW (23.4 MB)

Have you tried deleting the openCL kernels and letting darktable regenerate them?

Also a log file with -d opencl would be helpful.

If your times are about the same with and without openCL, then you’re probably not using openCL at all.

compared to 4.4.2 there are some changes how much memory is reserved for using opencl.

please have a look into your ~/.config/darktable/darktablerc for the lines:

cldevice_v5_appleapplem1max=0 250 0 16 16 128 0 0 0.000 0.000 0.500
cldevice_v5_appleapplem1max_building=-cl-fast-relaxed-math
cldevice_v5_appleapplem1max_id0=600

by default opencle is configured to use just 0.25 of the avalable memory what doesn’t make sense for apple since the system controls how much memory can be used by cpu or gpu.
Since you just have 8GB then maybe set the last value of the first cldevice_v5_… line to 0.5
then you can also set kernel build command: -cl-fast-relaxed-math

and of course check, if opencl=TRUE is set

darktable-cltest works on mac?

Thanks for the tip @paperdigits, I didn’t know about deleting cached openCL kernels, I had a whole response typed up with results from retesting, and log files.

However, @MStraeten’s suggestion worked perfectly! I modified the cldevice_v5_appleapplem1max and added the relaxed math option in my rc file as suggested. This resulted in a ‘d&s’ runtime of 28.7s, much faster than the >10minutes I was seeing before.

Compared to my 4.4.2 log file, it looks like it’s still using less memory overall, so I’ll probably tweak the values to see what works best.

  • 4.4.2: [dt_opencl_check_tuning] use 3459MB (tunemem=OFF, pinning=OFF) on device Apple Apple M2 id=0
  • 4.5.0 (default RC file): [dt_opencl_check_tuning] use 1126MB (headroom=OFF, pinning=OFF) on device Apple Apple M2 id=0
  • 4.5.0 (modified RC file): [dt_opencl_check_tuning] use 2526MB (headroom=OFF, pinning=OFF) on device Apple Apple M2 id=0

Thanks for the help!

I guess you would suggest to have this as default for apples? Are you sure about the -cl-fast-relaxed-math option?

thats the setting i use since a long long time - and never had issues with it.
Not sure if that should be default - it’s just based on own observation and occasional performance measurements.

Just a short question. If I change as suggested by @MStraeten the darktablerc file from .250 to .500 and restart dt, the change reverts to .250. Is this how should be? I am on iMac Intel OS14 and nightly builds.

Nope. There is a plausability test allowing values between “close to nothing” and 0.5, if a value outside the range is found it’s reset to 0.25 (default)

I thought about this setting again, not sure what the best default is for unified memory is. On <16GB GB ram machines it will be faster with 0.5 due to less tiling but we might run into trouble with mem resources on 8GB systems. Personally i would prefer being on the safe side …

There is no official vendor id for apple, could you “apple-owners” check what you have (adding some printf to opencl.c ~L. 1625 ?

We might want to use that after confirmation with other resources for specific compiler settings?

line 1625:

dt_print(DT_DEBUG_OPENCL,
             "[opencl_get_vendor_by_id] vendor id `%d'!\n", id);

[dt_opencl_device_init]
   DEVICE:                   0: 'Apple M1 Max'
   PLATFORM NAME & VENDOR:   Apple, Apple
   CANONICAL NAME:           appleapplem1max
   DRIVER VERSION:           1.2 1.0
   DEVICE VERSION:           OpenCL 1.2
   DEVICE_TYPE:              GPU, unified mem
     0.5104 [opencl_get_vendor_by_id] vendor id `16940800'!

will add an arm64 build later containing that id debug message so owners of different m1 variants can check, if the vendor id is constant.
found that number also for a m1 pro in an photoshop issue report - so first assumption: it is :wink:

here a build with that debug message:
darktable-4.5.0+959~gc5ba04684a_arm64_opencldebug.dmg

if you’re on 4.4.2 or older then use following steps to avoid messing up your darktable configuration:

  1. rename your ~/.config/darktable directory to somewhat different so this build can’t messup stuff.
  2. open the dmg (it’s unsigned so maybe you need to open it via context menu)
  3. run /Volumes/darktable/darktable.app/Contents/MacOS/darktable-cltest from terminal
  4. delete the generated ~/.config/darktable and rename the directory from step 1 back

if you see a different number then please report it

[dt_opencl_device_init]
   DEVICE:                   0: 'Apple M2'
   PLATFORM NAME & VENDOR:   Apple, Apple
   CANONICAL NAME:           appleapplem2
   DRIVER VERSION:           1.2 1.0
   DEVICE VERSION:           OpenCL 1.2
   DEVICE_TYPE:              GPU, unified mem
     0.0780 [opencl_get_vendor_by_id] vendor id `16940800'!

Same vendor ID value reported for me.

Here’s a base-model M1:

[dt_opencl_device_init]
   DEVICE:                   0: 'Apple M1'
   PLATFORM NAME & VENDOR:   Apple, Apple
   CANONICAL NAME:           appleapplem1
   DRIVER VERSION:           1.2 1.0
   DEVICE VERSION:           OpenCL 1.2 
   DEVICE_TYPE:              GPU, unified mem
     0.0483 [opencl_get_vendor_by_id] vendor id `16940800'!

Same number as well.