Why is Darktable editing not real-time? (What would need to be done?)

well yes, the whole story is that there is also the “preview” pipeline which is displayed as the overview in the top left corner. it is always processed additionally to the center view, but at reduced resolution. this is sometimes needed as context for the full pipeline as well, in case it only processes a region (thread sync nightmare).

and yeah, large gaussian blurs or similar will require a lot of context. dt can also do that by requesting larger regions of interest from the input, i.e. pad the buffer up some. all of which is of course slow.

(the good news is that the abovementioned GPU code path is faster even when processing the image in full res all pixels all the time, so it will not have such surprises)

Also check if OpenCL is enabled in the preferences. This gives dt a speed boost as well.

OK, I’ve done some performance analysis:

TL;DR: Setting OpenCL to “default” is a safe bet. And things didn’t go as planned.

Now, details:

I ran a benchmark on two computers: One Windows PC with an i5-4690K with an Nvidia GTX 1080 and 32 GB of memory, and one Linux PC with an i7-8809G with an AMD Radeon RX Vega M GH and 32 GB of memory. These should have roughly similar single-thread performance, but the Linux PC supports Hyperthreading and the Windows PC does not.

The benchmark used a 24 MP Fuji file with almost no adjustments, only

  • raw black/white point
  • white balance
  • highlight reconstruction
  • demosaic
  • orientation
  • exposure
  • input color profile
  • base curve
  • output color profile

And I captured the output of darktable -d perf when moving the exposure slider one tick (one mouse wheel tick, repeated a few times until timings settled. Given timings are typical timings, usually plus or minus 10%).

Here are the various OpenCL modes:

  • Windows/OpenCL Multiple GPUs: 1.192 s
  • Linux/OpenCL Multiple GPUs: 0.287 s
  • Windows/OpenCL Very Fast GPU: 1.130 s
  • Linux/OpenCL Very Fast GPU: 0.343 s
  • Windows/OpenCL Default: 0.591 s
  • Linux/OpenCL Default: 0.275 s
  • Windows/OpenCL deactivated: 0.416 s
  • Linux/OpenCL deactivated: 0.288 s

So, clearly, the Windows machine is just much slower. The OpenCL profile really has no influence on the Linux/AMD box, but a huge one on Windows/NVidia one. If in doubt, OpenCL default is probably fastest.

Next, I repeated the experiment without the Base Curve, but with Filmic, Color Balance, Contrast Equalizer, and Lens Correction additionally activated. That’s a more typical scenario for me:

  • Windows/OpenCL Multiple GPUs: 0.648 s
  • Linux/OpenCL Multiple GPUs: 0.570 s
  • Windows/OpenCL Very Fast GPU: 0.684 s
  • Linux/OpenCL Very Fast GPU: 0.578 s
  • Windows/OpenCL Default: 0.680 s
  • Linux/OpenCL Default: 0.456 s
  • Windows/OpenCL deactivated: 0.912 s
  • Linux/OpenCL deactivated: 0.703 s

Which is a bad thing, as apparently deactivating and reactivating OpenCL on Windows doubled my performance figures… I even went back to just the Base Curve to verify, and indeed my performance had doubled. Now the Windows box behaved roughly like the Linux box, just a bit slower. With more modules, OpenCL does indeed make a difference (and the default profile is still a safe bet).

Let’s do it one more time, but this time add a profiled denoise, because that’s supposed to be a worst case scenario:

  • Windows/OpenCL Multiple GPUs: 2.224 s
  • Linux/OpenCL Multiple GPUs: 0.582 s
  • Windows/OpenCL Very Fast GPU: 2.478 s
  • Linux/OpenCL Very Fast GPU: 0.497 s
  • Windows/OpenCL Default: 1.527 s
  • Linux/OpenCL Default: 0.482 s
  • Windows/OpenCL deactivated: 0.939 s
  • Linux/OpenCL deactivated: 0.690 s

This time, the Windows box profits massively from OpenCL (at least on the default profile), and the Linux box still kind of doesn’t care one way or another.


Going back to the original question about Darktable’s performance not being real-time. It is OK fast on the Linux box it seems, and the bulk of my problem was actually with some intermittent problem on Windows.

darktable does not have many windows developers. All the “core” developers run Linux. More windows devs are welcome, especially those with performance tuning experience :slight_smile:

2 Likes

If the processor has an additional opencl enabled gpu on board (you can check this by using darktable-cltest)
than tweaking the opencl settings can be useful: its not given that the more capable gpu takes over the more demanding takes.

Not a problem. I’m normally on Linux as well, and quite happy with Darktable.

I merely started my Windows box yesterday in order to have a look at Capture One and Lightroom and see what that’s all about. (And whenever I have to develop on Windows, I don’t like it. It’s nice enough for video games and such, but software development is a nightmare.)

Apart from the performance, though, there’s not much these programs offer over Darktable, I find. Lightroom in particular seems to lack a surprising number of features I’m accustomed to. Capture One has a very slick UI. But I really can’t tell them apart in their output.

I do a comparison like this every year or so. I usually come very close to wanting to buy Capture One initially. And then I see the price point and re-evaluate if a bit of smoothness is really worth that much money–and simultaneously realize that I’d have to switch to Windows if I truly wanted to use it. And then I play around for a while, and get to know the tools… and go back to Darktable, satisfied that it’s the best tool for my way of working. That’s how I deal with gear envy.

2 Likes

Why? Developing for RT on Windows works just fine for example…

Just to clarify, what I don’t like on Windows is software development. RAW development works just fine.

I just like working with a Unix shell. I actually do have to develop on Windows for some of my work, but I’m simply accustomed to be able to automate simple stuff with shell and Python scripts. That’s possible on Windows, too, of course, especially now with WSL and tools like GOW. But it’s more cumbersome to manage, and simply less fun for my tastes.

The “proper” workaround is using a big IDE of course, but again, that’s just not my preferred way of working. That said, some of Windows’ APIs are actually pretty nice to work with, and their API docs are (often) miles ahead compared to Linux’s or macOS’s.

nice, thanks for all these detailed timings! i really had no idea about windows, it seems there is a surprising amount of variance there. and i agree… these quarter to half second timings are workable somehow but clearly not great or fun to use.

1 Like

@bastibe Is it not faster with OPENCL deactivated with Windows…showing the fastest time there when using denoise profiled so actually not helping with denoise module??

@priort yes, odd, isn’t it? And that’s with a NVidia GTX 1080, which is not a slow GPU at all. In general, OpenCL is really only a modest performance improvement.

Looking into the timings in more detail, I noticed that half the rendering time is spent on updating the preview. As far as I understand, that’s the little thumbnail in the top left. If we could speed up darktable by a factor of two by not showing that thumbnail (or updating it less often), that would be a tremendously worthy tradeoff in my book.

Does anyone know if that’s indeed how it works?

There are some settings in preferences that affect this …maybe try some of those…

Mail](https://go.microsoft.com/fwlink/?LinkId=550986) for Windows 10

Should be more than fast enough. I guess darktable isn’t using the Nvidia GPU.
It would be interessting how device priorities are set.
Like:

Or: https://www.darktable.org/usermanual/en/darktable_and_opencl_multiple_devices.html

@pk5dark Thank you! My Windows machine was indeed using an integrated Intel graphics card that I didn’t know I had.
I’ve not figured out the exact details, yet, but at least some operations are now much faster on the Windows machine.


I just compiled a version of Darktable without the little preview window in the top left. And it is indeed about twice as fast in all GUI operations.

(Add a return; immediately after entering dt_dev_process_preview_job in src/develop/develop.c. This was surprisingly easy to figure out.)

Still not as fast as Capture One, but getting there. Most operations now take less than 200 ms.

Well, simply disabling the preview does come with a few downsides, such as no longer seeing a blurry picture when zooming, before the sharp render shows up. Not really a long-term solution.

But as it turns out, there was already an option in the code for using a smaller preview. It had a few bugs, though, if you actually enabled it. I fixed those bugs, and here are the performance numbers:

And now:

  • Linux/OpenCL Default: 0.150 s

A bit more than twice as fast as before. (Although some 30% or so were gained by compiling on my own machine, either because the current git version is faster than 3.0.0, or because local compilation sped it up.)

It used to be that the preview took about as long as the image. Now the preview takes a tiny fraction of the time, leaving the image to finish much faster.

I’ll open an issue on Github and see if I accidentally broke something here, or if that’s actually useful for someone.

4 Likes

I’m running Windows 10 and the lag for the tonecurve (same code as basecurve I think) is near
enough to real time. I’m not running OpenCL and my CPU is a 6 core i7 that is a few years old.

Recently there has been a lot of discussion and work on github of UI speed-ups… so maybe I just got lucky and picked off versions that had benefited. I’ve just built a fresh version tonight and it seems fine…

Try with the following settings in ~/.config/darktable/darktablerc

opencl_async_pixelpipe=true
opencl_device_priority=*/!0,*/*/*
opencl_mandatory_timeout=250
opencl_scheduling_profile=very fast GPU

The “very fast GPU” seems to contradict my performance timings above. The other options don’t seem to make a difference (on my Linux machine).

What are these settings supposed to do? Was this advice intended for the Windows issue or as a general performance advice?

First of all determine, which device is our faster gpu. Then set the opencl_device_priority to prioritize the faster gpu e.g. opencl_device_priority=1,*/… and opencl_scheduling_profile=default.
I found (at least on my mac) the other options doesn’t respect the prioritizing
Btw: there’re plenty of opencl infos in the usermanual: “OpenCL scheduling profile” in the darktable usermanual

I am currently having a real slow down ….running on windows…may try to format a drive and install Linux. In darkroom hitting the spacebar to advance takes around 7 seconds for the image to display….seems
to be recalculating something each time….happens if I advance and then even if I go right back still slow…even with only a couple of modules….I have deleted my database and library but still the issue persists…I know there are a lot of changes to the database
and history stack so maybe it’s a side effect…