I’ve just spent several hours trying to get my Linux system back into a stable state. Something in recent Ubuntu 24.04 LTS kernel updates is completely incompatible with the open source AMDGPU driver - which driver is currently the only way to get OpenCL to work on RDNA architecture graphics cards.
The symptom is the system starts to boot OK, then the monitor goes black, and stays black. Logs showed that Linux is up, but the X window server has crashed, or hung, at startup. The only way I could get back in was to hard reset and reboot to a different kernel. (I suppose I could have logged in remotely via SSH.)
The problem started with an update to HWE (hardware enablement) kernel 6.11.0-29-generic. I tried a recent standard kernel (6.8.0-63-generic) and got the same results. I was only able to get the system stable by booting into the previous kernel 6.11.0-26-generic, and uninstalling all of the AMDGPU software, including the ROCr OpenCL implementation.
If you have enabled overclocking with any driver, either the default Mesa or AMDGPU, you will get the same results. I had been running the card with a decent undervolt, but I had to completely disable overclocking in the LACT app.
The open source alternative to ROCr OpenCL, Rusticl, is not yet ready for prime time, unfortunately.
I have hopes that AMD will remedy this situation soon. They’re due for a new driver release “any day now™”.
But in the meantime, I can’t take advantage of this speedy video card to accelerate darktable and the like. This is frustrating.
I couldn’t figure out where to file a bug report. AMD has a page for filing issues against ROCm, but I couldn’t find one for AMDGPU. And technically it’s not a bug in the Linux kernel, nor Ubuntu, because the only thing affected is AMDGPU. If someone can point me to a page for filing issues against AMDGPU, I will report it.
This might potentially be related to ubuntu’s kernel backports, rather than an upstream issue. Wouldn’t surprise me if it’s the latter, but it might be worth checking if this occurs when using a repo for the latest kernel version.
Hope someone investigates. You may have to follow up to draw people’s attention.
Some issue still exists. I have recently “upgraded” to an AMD 9060 XT gpu.
Under Tumbleweed I was able to get steam running, DT worked well with rusticl and LM studio using rocm. But I could not make Davinci Resolve run in any way (tried native, davincibox, …).
Now I have ubuntu installed and again steam and LM studio works. DT 4.6.1 detects OpenCL nicely (using apt install) but the snap install 5.4 does not detect anything.
I think the problem is related to the snap install but I am not sure. Clinfo shows a lot of stuff and darktable-cltest shows the result for 4.6.1.
If anyone can advise how to proceed - thanks in advance.