darktable does not respond and cannot be terminated

Hi,
I have since a few weeks problems with my darktable installation.

It seems that while developing images, the fan suddenly spins very high, the darktable process permanently takes about 80-90% CPU time and darktable stops responding.

Starting with version 3.2.1 under Arch Linux on a notebook with 32 GB RAM and Intel UHD / Nvidia RTX 2060 Hybrid GPU, darktable now crashes regularly during different actions, unfortunately not always during comprehensible processing steps.

In the meantime I have also tried git versions again and again and am now currently on git version 3.5.xxxx.

But mostly when changes are made to the local contrast, when using masks or in the denoise module (profiled).

When the problem occurs, I can’t close darktable either. If I click on “Close”, it takes a moment and I am asked if I want to force quit darktable. If I do that, the darktable window closes.

Nevertheless the computer is still under load… Checking “top” shows that darktable is still there and needs 80-90% CPU time.

With “ps aux” I can’t display the process list completely. The output gets stuck in the line of the output where darktable should be listed. I have to abort “ps aux” with CTRL-C…

With the PID of darktable I can’t kill the process either. There is no error, but the process remains.

In the end only a reboot helps even a shutdown doesn’t work properly and I have to shut down the computer without a proper shutdown.

I would like to find out what is causing this and wonder how to proceed? I’m not a developer and I don’t know the tools that are out there to figure this out.

However, I don’t want to reinstall my otherwise well running laptop.

Can you take me by the hand and tell me what I can do little by little to figure this out?

I am writing here because there are only problems with darktable. All other applications are running perfectly.

Many greetings,
mabunix

I’ve had this problem too with OpenCL on an Intel GPU (via Neo, not Beignet) lately. It was fine, even from git, for months. Sometime in the past week or two, this issue started happening. It’s usually fine for working on several photos in one go, but then it hits the freeze issue.

Since I was working on around 150 photos, getting them ready for printing, as a present to my S.O., I finally disabled OpenCL just to have stability, sacrificing speed. Oh well.

It’s so bad I’d actually prefer it if darktable would just crash in these instances. It’s better than a stuck process I cannot kill (with any command-line utility) and then having to reboot (and then lose work as if it just crashed). :slightly_frowning_face:

I tried debugging it in so many different ways, but there’s:

  • no error in dmesg or journalctl
  • no error when running darktable from the command line, even with darktable -d all

I meant to file a bug, but I just don’t know where the problem is nor do I know how to cause it.

However, after disabling OpenCL, everything’s fine. (I wanted to test this before reporting the bug, so I could at least narrow it down a little.)

So I guess it’s some new OpenCL that runs amok on our GPUs? Since you’re using an NVidia GPU and I’m using an Intel one, it doesn’t seem to be tied to the chip, but the OpenCL code somewhere in darktable.

FWIW: I’m on Fedora 33 with an “Intel HD Graphics 620 (rev 02)” using OpenCL with darktable built from git when this happens. darktable 3.2.1 worked fine for me before switching over to nightlies (mainly for the color calibration, but also for all the other new stuff in darktable).

1 Like

Thanks for your message. Then I am fortunately not alone. I use OpenCL with Nvidia and Intel.

I will now uninstall OpenCL for Intel completely and then test again with OpenCL for Nvidia only.

Interestingly though, a friend has the identical laptop, uses the same Linux distribution and has the same OpenCL settings.

He does not have these problems with darktable.

That sounds like a driver bug.
It might be interesting to compare driver versions for the gpus, could be a bug in a specific version.
This link might also be useful? (note the difference between supported and reported openCL versions in the second inset)

I have removed Intel OpenCL. The problem is still there.

I have attached the output of hashcat, clinfo and darktable-cltest once. It looks OK to me so far. But that doesn’t mean anything, I only see that OpenCL is present and used.

clinfo.txt (7.5 KB) darktable-cltest.txt (43.0 KB) hashcat.txt (806 Bytes)

Here’s my system info…

$ rpm -qa | grep -i intel
xorg-x11-drv-intel-2.99.917-48.20200205.fc33.x86_64
intel-gmmlib-20.3.2-1.fc33.x86_64
intel-gmmlib-devel-20.3.2-1.fc33.x86_64
intel-igc-core-1.0.5585-1.fc33.x86_64
intel-igc-opencl-1.0.5585-1.fc33.x86_64
intel-opencl-20.47.18513-1.fc33.x86_64
intel-level-zero-gpu-1.0.18513-1.fc33.x86_64

$ rpm -qa | grep -i opencl
opencl-utils-1-12.svn16.fc33.x86_64
opencl-headers-3.0-2.20200512gitd082d42.fc33.noarch
opencl-utils-devel-1-12.svn16.fc33.x86_64
opencl-filesystem-1.0-12.fc33.noarch
intel-igc-opencl-1.0.5585-1.fc33.x86_64
intel-opencl-20.47.18513-1.fc33.x86_64
wine-opencl-6.0-0.2rc2.fc33.x86_64
wine-opencl-6.0-0.2rc2.fc33.i686
mesa-libOpenCL-20.2.4-2.fc33.x86_64

clinfo.txt (12.0 KB)
darktable-cltest.txt (43.5 KB)
hashcat.txt (1.2 KB)

Intel Neo OpenCL was working wonderfully with darktable for the past year on my system — until a week or two ago.

Do the darktable memory settings also apply to the GPU? Or just the main system? I had some custom parameters in there (that I just removed). It’s hard to test this as it is triggered so randomly.

I’m using the OBS nightly build of darktable on Fedora 33. I didn’t have an issue with the Fedora-provided darktable or ones that I compiled myself. Perhaps that’s part of the problem? I don’t know.

(Although I didn’t seem to have a problem using darktable with OpenCL with the nightlies until the past couple weeks. However, my sessions weren’t quite as long until recently. Perhaps there’s some kind of memory leak affecting the GPU? And then my computer passes that threashold eventually and it’s not handled well (somewhere, either in darktable and/or Intel Neo OpenCL), so darktable hangs, causing a reboot to be necessary? Just completely guessing here though.)

One can always try the flatpak version, which works well on my systems.