Black image with OpenCL in macOS M2

Hi!

I have a Macbook M2 and I’m trying to enable OpenCL. For some images the preview is nearly fully black, not completely but mostly.

Info about my setup:

  • Macbook Air M2 8GB
  • macOS Ventura 13.5.2
  • RAWs from Fujifilm X-T5
  • Darktable 4.4.2 installed using brew

Some observations:

  • It dependent on the zoom used in the preview window. fit always cause problem and smaller zoom such that all the image is rendered also display the problem, if the zoom is high enough that I need to pan, the problem disappears
  • I don’t think it is associated with any of specific module, I tried to enable disable each one, but no changes
  • If I run darktable with -d perf the problem disappears!

Any ideia of extra troubleshoot or investigation I can try?

Thanks!

Logs from -d opencl

     0.0000 [dt_init] SSE2 is unavailable, some functions will be noticeably slower.
     0.0742 [dt_get_sysresource_level] switched to 1 as `default'
     0.0742   total mem:       8192MB
     0.0742   mipmap cache:    1024MB
     0.0742   available mem:   4096MB
     0.0742   singlebuff:      64MB
     0.0742   OpenCL tune mem: OFF
     0.0742   OpenCL pinned:   OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
[opencl_init] opencl library '/System/Library/Frameworks/OpenCL.framework/Versions/Current/OpenCL' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'Apple M2'
   PLATFORM NAME & VENDOR:   Apple, Apple
   CANONICAL NAME:           appleapplem2
   DRIVER VERSION:           1.2 1.0
   DEVICE VERSION:           OpenCL 1.2
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          5461 MB
   MAX MEM ALLOC:            1024 MB
   MAX IMAGE SIZE:           16384 x 16384
   MAX WORK GROUP SIZE:      256
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 256 256 256 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   MEMORY TUNING:            NO
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      128
   PERFORMANCE:              0.000
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /Applications/darktable.app/Contents/Resources/share/darktable/kernels
   KERNEL DIRECTORY:         /Users/kazuo/.cache/darktable/cached_v1_kernels_for_AppleAppleM2_1210
   CL COMPILER OPTION:
   KERNEL LOADING TIME:       0.0358 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init]           0       'Apple Apple M2'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       -1      0       0       -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       -1      0       0       -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[opencl_synchronization_timeout] synchronization timeout set to 200
     6.6270 [dt_opencl_check_tuning] use 3459MB (tunemem=OFF, pinning=OFF) on device `Apple Apple M2' id=0
     7.3665 [pixelpipe_process_CL]       [full]         colorout               (   0/   0)  868x1302 scale=0.1680 --> (   0/   0)  868x1302 scale=0.1680 cl input data to host
 [opencl_summary_statistics] device 'Apple Apple M2' (0): 110 out of 110 events were successful and 0 events lost. max event=109

Logs from -d perf

     0.0000 [dt_init] SSE2 is unavailable, some functions will be noticeably slower.
     2.7436 [dt_dev_load_raw] loading the image. took 0.152 secs (0.635 CPU)
     3.1409 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.1628 [dt_dev_process_image_job] loading image. took 0.000 secs (0.000 CPU)
     3.1860 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.1935 [dev_pixelpipe] took 0.000 secs (0.000 CPU) initing base buffer [full]
     3.2181 [dev_pixelpipe] took 0.025 secs (0.037 CPU) [full] processed `rawprepare' on GPU, blended on GPU
     3.2220 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.2306 [dev_pixelpipe] took 0.012 secs (0.019 CPU) [full] processed `temperature' on GPU, blended on GPU
     3.2511 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.2675 [dev_pixelpipe] took 0.037 secs (0.021 CPU) [full] processed `highlights' on GPU, blended on GPU
     3.2679 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.2847 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.2911 [resample_cl] plan 0.000 secs (0.000 CPU) resample 0.000 secs (0.000 CPU)
     3.3013 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.3188 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.3340 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.3507 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.7186 [histogram] took 0.000 secs (0.000 CPU) scope draw
     3.7574 [histogram] took 0.001 secs (0.000 CPU) scope draw
     4.1983 [dev_pixelpipe] took 0.931 secs (0.070 CPU) [full] processed `demosaic' on GPU with tiling, blended on CPU
     4.2307 [dev_pixelpipe] took 0.032 secs (0.172 CPU) [full] processed `lens' on CPU, blended on CPU
     4.2343 [dev_pixelpipe] took 0.004 secs (0.000 CPU) [full] processed `flip' on GPU, blended on GPU
     4.2359 [dev_pixelpipe] took 0.002 secs (0.000 CPU) [full] processed `exposure' on GPU, blended on GPU
     4.2425 [dev_pixelpipe] took 0.007 secs (0.019 CPU) [full] processed `toneequal' on CPU, blended on CPU
     4.2477 [dev_pixelpipe] took 0.005 secs (0.000 CPU) [full] processed `colorin' on GPU, blended on GPU
     4.2485 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.000 secs (0.000 GPU) [channelmixerrgb]
     4.2515 [dev_pixelpipe] took 0.004 secs (0.001 CPU) [full] processed `channelmixerrgb' on GPU, blended on GPU
     4.2521 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.000 secs (0.000 GPU) [colorchecker]
     4.2564 [dev_pixelpipe] took 0.005 secs (0.001 CPU) [full] processed `colorchecker' on GPU, blended on GPU
     4.2570 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.000 secs (0.000 GPU) [filmicrgb]
     4.2625 [dev_pixelpipe] took 0.006 secs (0.001 CPU) [full] processed `filmicrgb' on GPU, blended on GPU
     4.2632 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.000 secs (0.000 GPU) [tonecurve]
     4.2668 [dev_pixelpipe] took 0.004 secs (0.001 CPU) [full] processed `tonecurve' on GPU, blended on GPU
     4.2703 [dev_pixelpipe] took 0.003 secs (0.000 CPU) [full] processed `colorout' on GPU, blended on GPU
     4.2726 [dev_pixelpipe] took 0.002 secs (0.003 CPU) [full] processed `gamma' on CPU, blended on CPU
     4.2730 [dev_process_image] pixel pipeline took 1.080 secs (0.347 CPU) processing `_DSF1982.RAF'
     4.2790 [histogram] took 0.000 secs (0.000 CPU) scope draw
     4.3046 [histogram] took 0.000 secs (0.001 CPU) scope draw
     4.3766 [dev_pixelpipe] took 0.000 secs (0.000 CPU) initing base buffer [preview]
     4.3774 [dev_pixelpipe] took 0.001 secs (0.004 CPU) [preview] processed `rawprepare' on CPU, blended on CPU
     4.3778 [dev_pixelpipe] took 0.000 secs (0.001 CPU) [preview] processed `temperature' on CPU, blended on CPU
     4.3793 [dev_pixelpipe] took 0.001 secs (0.004 CPU) [preview] processed `highlights' on CPU, blended on CPU
     4.4052 [dev_pixelpipe] took 0.026 secs (0.100 CPU) [preview] processed `demosaic' on CPU, blended on CPU
     4.4347 [dev_pixelpipe] took 0.029 secs (0.165 CPU) [preview] processed `lens' on CPU, blended on CPU
     4.4373 [dev_pixelpipe] took 0.003 secs (0.008 CPU) [preview] processed `flip' on CPU, blended on CPU
     4.4388 [dev_pixelpipe] took 0.002 secs (0.002 CPU) [preview] processed `exposure' on CPU, blended on CPU
     4.4443 [dev_pixelpipe] took 0.005 secs (0.021 CPU) [preview] processed `toneequal' on CPU, blended on CPU
     4.4457 [dev_pixelpipe] took 0.001 secs (0.006 CPU) [preview] processed `colorin' on CPU, blended on CPU
     4.4466 [dt_ioppr_transform_image_colorspace] IOP_CS_LAB-->IOP_CS_RGB took 0.001 secs (0.005 CPU) [channelmixerrgb]
     4.4595 [dev_pixelpipe] took 0.014 secs (0.084 CPU) [preview] processed `channelmixerrgb' on CPU, blended on CPU
     4.4604 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0.001 secs (0.005 CPU) [colorchecker]
     4.4744 [dev_pixelpipe] took 0.015 secs (0.087 CPU) [preview] processed `colorchecker' on CPU, blended on CPU
     4.4755 [dt_ioppr_transform_image_colorspace] IOP_CS_LAB-->IOP_CS_RGB took 0.001 secs (0.005 CPU) [filmicrgb]
     4.5168 [dev_pixelpipe] took 0.042 secs (0.270 CPU) [preview] processed `filmicrgb' on CPU, blended on CPU
     4.5178 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0.001 secs (0.005 CPU) [tonecurve]
     4.5184 histogram calculation 256 bins 1 -> -1 compensate 0 3 channels 1202432 pixels took 0.001 secs (0.004 CPU)
     4.5238 [dev_pixelpipe] took 0.007 secs (0.036 CPU) [preview] processed `tonecurve' on CPU, collected histogram on CPU, blended on CPU
     4.5255 [dev_pixelpipe] took 0.002 secs (0.008 CPU) [preview] processed `colorout' on CPU, blended on CPU
     4.5260 [dev_pixelpipe] took 0.001 secs (0.003 CPU) [preview] processed `gamma' on CPU, blended on CPU
     4.5309 [dt_ioppr_transform_image_colorspace_rgb] RGB-->RGB took 0.005 secs (0.024 CPU) [final histogram]
     4.5866 [histogram] took 0.061 secs (0.084 CPU) final waveform
     4.5870 [dev_process_preview] pixel pipeline processing took 0.233 secs (0.922 CPU)
     4.5954 [histogram] took 0.002 secs (0.002 CPU) scope draw
     4.8811 [histogram] took 0.007 secs (0.007 CPU) scope draw
     4.9084 [histogram] took 0.007 secs (0.007 CPU) scope draw
     8.1631 [histogram] took 0.002 secs (0.002 CPU) scope draw
    15.9761 [histogram] took 0.002 secs (0.002 CPU) scope draw

Looks like your max memory size is 1GB. Probably not enough to do nay good. Turn off openCL.

I’m trying to enable OpenCL because the impact on some benchmarks on this machine looks promising.

For this image in my first message, also enabling diffuse & sharpen and denoise profiled, the time to export reduce from 100s to 30s by enabling OpenCL. At least I feel like Darktable is much more responsible when enabling OpenCL for editing (if I also pass the -d perf to avoid the black image).

$ darktable-cli _DSF1982.RAF _DSF1982.RAF.xmp test.jpeg --core --disable-opencl -d perf -d opencl
   101.4428 [dev_process_export] pixel pipeline processing took 100.428 secs (742.645 CPU)
   101.9918 [export_job] exported to `test.jpg'

$ darktable-cli _DSF1982.RAF _DSF1982.RAF.xmp test.jpeg --core -d perf -d opencl
...
    31.9067 [dev_process_export] pixel pipeline processing took 30.779 secs (7.002 CPU)
    32.5023 [export_job] exported to `test.jpg'
 [opencl_summary_statistics] device 'Apple Apple M2' (0): 1932 out of 1932 events were successful and 0 events lost. max event=1931

Yes of course openCL makes things better, but if it isn’t reliable, then is it worth it? I guess that’s up to you.

But if you say “the image is black but turning off openCL fixes it” then you have your answer.

This is enough GPU memory.

This is interesting.

@g-man indeed interesting! Might hint to NaN problem …

@Kazuo there were some scaling problems in the interpolator. You might try to use “bicubic” instead of the default, that should help on dt 4.4

Also, if your issue is happening with the same image and applied module again and again you should share the raw image file plus the xmp file you are using for the issue to appear.