Beware! Ubuntu 24.04 LTS kernel updates break AMDGPU open source driver

I’ve just spent several hours trying to get my Linux system back into a stable state. Something in recent Ubuntu 24.04 LTS kernel updates is completely incompatible with the open source AMDGPU driver - which driver is currently the only way to get OpenCL to work on RDNA architecture graphics cards.

Hardware: Ryzen 9 5950X, ASRock X570M Pro motherboard, Sapphire Pulse RX 7800 XT graphics card.

The symptom is the system starts to boot OK, then the monitor goes black, and stays black. Logs showed that Linux is up, but the X window server has crashed, or hung, at startup. The only way I could get back in was to hard reset and reboot to a different kernel. (I suppose I could have logged in remotely via SSH.)

The problem started with an update to HWE (hardware enablement) kernel 6.11.0-29-generic. I tried a recent standard kernel (6.8.0-63-generic) and got the same results. I was only able to get the system stable by booting into the previous kernel 6.11.0-26-generic, and uninstalling all of the AMDGPU software, including the ROCr OpenCL implementation.

If you have enabled overclocking with any driver, either the default Mesa or AMDGPU, you will get the same results. I had been running the card with a decent undervolt, but I had to completely disable overclocking in the LACT app.

The open source alternative to ROCr OpenCL, Rusticl, is not yet ready for prime time, unfortunately.

I have hopes that AMD will remedy this situation soon. They’re due for a new driver release “any day now™”.

But in the meantime, I can’t take advantage of this speedy video card to accelerate darktable and the like. This is frustrating.

Do you have a bug report for this issue ?

I couldn’t figure out where to file a bug report. AMD has a page for filing issues against ROCm, but I couldn’t find one for AMDGPU. And technically it’s not a bug in the Linux kernel, nor Ubuntu, because the only thing affected is AMDGPU. If someone can point me to a page for filing issues against AMDGPU, I will report it.

Possibly here:

1 Like

Thanks, filed an issue there.

I see a response from LACT.

This might potentially be related to ubuntu’s kernel backports, rather than an upstream issue. Wouldn’t surprise me if it’s the latter, but it might be worth checking if this occurs when using a repo for the latest kernel version.

Hope someone investigates. You may have to follow up to draw people’s attention.