A short update about my findings so far: it seems that the OpenCL implementation (or GPU driver?) on Mac does a quite conservative interpretation of the specification and just assigns the minimum admissible value of 512 MB (1/4th of the available 2 GB of the graphics card) - the specification reads:
CL_DEVICE_MAX_MEM_ALLOC_SIZE
Max size of memory object allocation in bytes. The minimum value is max (1/4th of
CL_DEVICE_GLOBAL_MEM_SIZE , 128 * 1024 * 1024) for devices that are not of
type CL_DEVICE_TYPE_CUSTOM.
I don’t know if this is the reason for the significantly worse performance compared to the Windows10 partition on the same Mac, or if Apple’s OpenCL implementation is just not optimized well.
BTW, darktable’s opencl compiler options on Mac show -DUNKNOWN for the vendor ID instead of -DAMD, not sure if this has an effect or not? On Windows it’s correctly -DAMD.
The numerical vendor ID is not reported as 4098 (AMD) but as 16915456 (Apple?), which is unknown to darktable. Patching it to 4098 to force -DAMD does not improve the performance, though …