The first conformant M1 GPU driver

https://rosenzweig.io/blog/first-conformant-m1-gpu-driver.html

I guess this is a milestone for those who want to run Linux on their Apple Silicon machines. :arrow_heading_up:

5 Likes

I have been happily using Asahi Arch KDE Plasma remix on M1 iMac for some time. Switched to Fedora remix a few days ago. All works as far as I need/can tell except there is no M1GPU-specific OpenCL runtime yet… Asahi installer script refuses to install on an M2 Pro.

1 Like

Thanks for the feedback! So no OpenCL acceleration in darktable just yet…?

No, not yet.

I’d be really curious to know the performance you are getting on Darktable with an M1 running linux.
I’ll be even more when opencl implementation will come :slight_smile:

Darktable performs on M1 CPU in Asahi Linux exactly the same as in macOS. The entire KDE Plasma works very well too.

2 Likes

Thanks for the link back to the test page, I can’t wait for graphic drivers in the linux kernel to enable opencl compatibility, I’d so much like for m1 mac in linux to be my next low power darktable machine :smiley:

Asahi now supports OpenCL. I ran the benchmark from Šarūnas link above (for setubal.orf) with darktable 5.2.1 on my Macbook Pro (M1 Pro w/ 8 CPU and 14 GPU cores) under MacOS 26 and under Fedora Asahi Remix (where I had to manually activate rusticl in the darktable settings). I only ran each test once, no averaging over multiple runs.

Under MacOS, the pipeline took 13.0s CPU and 6.4s CPU+GPU.
Under Fedora Asahi Remix, I got 15.0s CPU and 43.2s CPU+GPU.

I acknowledge that rusticl is considered experimental but am still wondering about the bad CPU+GPU performance under Asahi. The test finishes with [opencl_summary_statistics] device 'rusticl Apple M1 Pro' id=0: 423 out of 423 events were successful and 0 events lost. max event=422
Did anyone else try running these tests under Asahi and/or knows what might be going on, here?

I did try recently the same and got the same result as you. Processing in GPU with rusticl, M1+Asahi is much slower than in CPU-only.

1 Like

Is tiling due to available memory?

That’s unfortunate, thank you!
Just out of curiosity (and without understanding very much about GPU drivers and the darktable internals), I ran clpeak on MacOS and Asahi Linux. Most numbers are very comparable, while single-precision compute is actually quite a bit higher on Asahi (e.g. float16: 4393.40 vs float16 : 2257.56). Transfer bandwidth is a little lower on Asahi, though, and (most significantly) Kernel launch latency is higher with 18.70us vs. 1.97us.
No idea whether this is somehow related to the performance issues, just thought it would be interesting to some.

I am not sure what you are asking exactly here, but would available memory differ between the two OS?

ruaticl can read available memory different than macos.

run the same image with the same xmp using the -d perf flag to compare each module performance.

If you are interested in more detail, I am attaching the full output from my tests below. I am running darktable-cli setubal.orf setubal.orf.xmp test.jpg --core -d perf -d opencl

dt-bench-gpu-asahi.txt (11.8 KB)
dt-bench-gpu-macos.txt (10.5 KB)

Indeed it is mostly the modules that use tiling that run slower under Asahi vs MacOS. However, also hazeremoval runs significantly slower under Asahi. Other observations are that the KERNEL LOADING TIME is longer under Asahi. The biggest difference is in the line [opencl_profiling] spent 14.7599 seconds in [Read Image (from device to host)].

UNIFIED MEM SIZE: 3897 MB reserved for

vs

UNIFIED MEM SIZE: 4096 MB reserved for

I don’t think that’s a significant difference in memory. Some of the modulus take a significant amount of time with the rusticl.