Why is Darktable editing not real-time? (What would need to be done?)

bastibe · February 3, 2020, 6:56am

I love Darktable.

But every now and then, I have a look at the competition, such as Lightroom and Capture One. And the one thing I truly miss when going back to Darktable is the immediate feedback of moving a slider, and seeing its effect in real time.

Could someone knowledgeable about the internals of Darktable explain why Darktable takes half a second or so to show the effects of e.g. an exposure adjustment?

Is this something intrinsic to the way Darktable works, or would it be possible to speed up the code, if you set your mind to it?

MStraeten · February 3, 2020, 7:25am

there are several time consuming modules (e.g. denoise profiled or retouch) - you can see them running darktable with -d perf option.

bastibe · February 3, 2020, 8:26am

To clarify, Darktable is taking half a second even with just the base curve active. Exposure adjustments are never real-time (at least on my computers, but they are fairly recent and fast).

On further inspection, it seems that Capture One is only rendering a downscaled version of the image, or if zoomed in, only a portion of it. The former is obviously visible as lack of sharpness, the latter as panning intermittendly exposes un-rendered areas.

Does Darktable do something similar? Are earlier stages in the processing pipeline cached when modifying a later stage?

kanyck · February 3, 2020, 8:35am

I may confirm that last versions (I’m using git version currently) has serious issues with UI response time. Now it became terribly slow. If few versions ago mostly parametric mask sliders were irresponslible (a couple of seconds or more between you move the margin mark and UI responds), in a version I currenly use the picker area became extremely slow to the extent that it’s really difficult to use, so it’s getting worse. 2.6 was blazing fast in terms of UI, though .

hanatos · February 3, 2020, 8:39am

this certainly sounds like something in your setup is broken, and i’d recommend you follow the previous advice of checking with -d perf what is going on.

“only base curve active” is misleading, to go from raw to picture you’ll certainly need more modules than that (which you may not be aware of).

that said, to answer your questions: yes, dt is processing only the visible pixels only at the current scale (zoomed in: just a portion, zoomed out: just low res). pipeline stages are cached, only the active module onwards is processed.

if you’re truly into speed, what would need to be done? rewrite the whole pipeline for the GPU. as it turns out, image operations work very closely like the hardware is built, the texture units are a big help, and just avoiding the occasional cpu/gpu buffer copies and the bulky software stack cairo->gtk->x11->screen are a major performance boost over current dt’s opencl implementation.

MStraeten · February 3, 2020, 9:25am

yes, dt is processing only the visible pixels only at the current scale (zoomed in: just a portion, zoomed out: just low res)

unfortunately thats not completely true - i found that zooming in can result in a degration of performance (see Denoise profiled gets very slow in darkroom when using toneequalizer · Issue #3081 · darktable-org/darktable · GitHub) in dependency of used modules.

bastibe · February 3, 2020, 9:29am

Thank you for your answers so far!

Sorry, I said that imprecisely. I meant that the base curve is the only user-selectable module I enabled (white in the pipeline). I understand that demosaicing, color space conversions, etc. are always active (grey in the pipeline).

I’ll check the performance profiler when I get back home tonight. Maybe something is broken indeed–although I did test on multiple operating systems with different GPU/CPU setups.

hanatos · February 3, 2020, 10:10am

well yes, the whole story is that there is also the “preview” pipeline which is displayed as the overview in the top left corner. it is always processed additionally to the center view, but at reduced resolution. this is sometimes needed as context for the full pipeline as well, in case it only processes a region (thread sync nightmare).

and yeah, large gaussian blurs or similar will require a lot of context. dt can also do that by requesting larger regions of interest from the input, i.e. pad the buffer up some. all of which is of course slow.

(the good news is that the abovementioned GPU code path is faster even when processing the image in full res all pixels all the time, so it will not have such surprises)

Thanatomanic · February 3, 2020, 1:38pm

Also check if OpenCL is enabled in the preferences. This gives dt a speed boost as well.

bastibe · February 3, 2020, 8:07pm

OK, I’ve done some performance analysis:

TL;DR: Setting OpenCL to “default” is a safe bet. And things didn’t go as planned.

Now, details:

I ran a benchmark on two computers: One Windows PC with an i5-4690K with an Nvidia GTX 1080 and 32 GB of memory, and one Linux PC with an i7-8809G with an AMD Radeon RX Vega M GH and 32 GB of memory. These should have roughly similar single-thread performance, but the Linux PC supports Hyperthreading and the Windows PC does not.

The benchmark used a 24 MP Fuji file with almost no adjustments, only

raw black/white point
white balance
highlight reconstruction
demosaic
orientation
exposure
input color profile
base curve
output color profile

And I captured the output of darktable -d perf when moving the exposure slider one tick (one mouse wheel tick, repeated a few times until timings settled. Given timings are typical timings, usually plus or minus 10%).

Here are the various OpenCL modes:

Windows/OpenCL Multiple GPUs: 1.192 s
Linux/OpenCL Multiple GPUs: 0.287 s
Windows/OpenCL Very Fast GPU: 1.130 s
Linux/OpenCL Very Fast GPU: 0.343 s
Windows/OpenCL Default: 0.591 s
Linux/OpenCL Default: 0.275 s
Windows/OpenCL deactivated: 0.416 s
Linux/OpenCL deactivated: 0.288 s

So, clearly, the Windows machine is just much slower. The OpenCL profile really has no influence on the Linux/AMD box, but a huge one on Windows/NVidia one. If in doubt, OpenCL default is probably fastest.

Next, I repeated the experiment without the Base Curve, but with Filmic, Color Balance, Contrast Equalizer, and Lens Correction additionally activated. That’s a more typical scenario for me:

Windows/OpenCL Multiple GPUs: 0.648 s
Linux/OpenCL Multiple GPUs: 0.570 s
Windows/OpenCL Very Fast GPU: 0.684 s
Linux/OpenCL Very Fast GPU: 0.578 s
Windows/OpenCL Default: 0.680 s
Linux/OpenCL Default: 0.456 s
Windows/OpenCL deactivated: 0.912 s
Linux/OpenCL deactivated: 0.703 s

Which is a bad thing, as apparently deactivating and reactivating OpenCL on Windows doubled my performance figures… I even went back to just the Base Curve to verify, and indeed my performance had doubled. Now the Windows box behaved roughly like the Linux box, just a bit slower. With more modules, OpenCL does indeed make a difference (and the default profile is still a safe bet).

Let’s do it one more time, but this time add a profiled denoise, because that’s supposed to be a worst case scenario:

Windows/OpenCL Multiple GPUs: 2.224 s
Linux/OpenCL Multiple GPUs: 0.582 s
Windows/OpenCL Very Fast GPU: 2.478 s
Linux/OpenCL Very Fast GPU: 0.497 s
Windows/OpenCL Default: 1.527 s
Linux/OpenCL Default: 0.482 s
Windows/OpenCL deactivated: 0.939 s
Linux/OpenCL deactivated: 0.690 s

This time, the Windows box profits massively from OpenCL (at least on the default profile), and the Linux box still kind of doesn’t care one way or another.

Going back to the original question about Darktable’s performance not being real-time. It is OK fast on the Linux box it seems, and the bulk of my problem was actually with some intermittent problem on Windows.

paperdigits · February 3, 2020, 8:26pm

darktable does not have many windows developers. All the “core” developers run Linux. More windows devs are welcome, especially those with performance tuning experience

MStraeten · February 3, 2020, 8:43pm

If the processor has an additional opencl enabled gpu on board (you can check this by using darktable-cltest)
than tweaking the opencl settings can be useful: its not given that the more capable gpu takes over the more demanding takes.

bastibe · February 3, 2020, 9:08pm

Not a problem. I’m normally on Linux as well, and quite happy with Darktable.

I merely started my Windows box yesterday in order to have a look at Capture One and Lightroom and see what that’s all about. (And whenever I have to develop on Windows, I don’t like it. It’s nice enough for video games and such, but software development is a nightmare.)

Apart from the performance, though, there’s not much these programs offer over Darktable, I find. Lightroom in particular seems to lack a surprising number of features I’m accustomed to. Capture One has a very slick UI. But I really can’t tell them apart in their output.

I do a comparison like this every year or so. I usually come very close to wanting to buy Capture One initially. And then I see the price point and re-evaluate if a bit of smoothness is really worth that much money–and simultaneously realize that I’d have to switch to Windows if I truly wanted to use it. And then I play around for a while, and get to know the tools… and go back to Darktable, satisfied that it’s the best tool for my way of working. That’s how I deal with gear envy.

heckflosse · February 3, 2020, 10:27pm

Why? Developing for RT on Windows works just fine for example…

bastibe · February 4, 2020, 6:29am

Just to clarify, what I don’t like on Windows is software development. RAW development works just fine.

I just like working with a Unix shell. I actually do have to develop on Windows for some of my work, but I’m simply accustomed to be able to automate simple stuff with shell and Python scripts. That’s possible on Windows, too, of course, especially now with WSL and tools like GOW. But it’s more cumbersome to manage, and simply less fun for my tastes.

The “proper” workaround is using a big IDE of course, but again, that’s just not my preferred way of working. That said, some of Windows’ APIs are actually pretty nice to work with, and their API docs are (often) miles ahead compared to Linux’s or macOS’s.

hanatos · February 4, 2020, 7:45am

nice, thanks for all these detailed timings! i really had no idea about windows, it seems there is a surprising amount of variance there. and i agree… these quarter to half second timings are workable somehow but clearly not great or fun to use.

priort · February 4, 2020, 4:44pm

@bastibe Is it not faster with OPENCL deactivated with Windows…showing the fastest time there when using denoise profiled so actually not helping with denoise module??

bastibe · February 4, 2020, 7:11pm

@priort yes, odd, isn’t it? And that’s with a NVidia GTX 1080, which is not a slow GPU at all. In general, OpenCL is really only a modest performance improvement.

Looking into the timings in more detail, I noticed that half the rendering time is spent on updating the preview. As far as I understand, that’s the little thumbnail in the top left. If we could speed up darktable by a factor of two by not showing that thumbnail (or updating it less often), that would be a tremendously worthy tradeoff in my book.

Does anyone know if that’s indeed how it works?

priort · February 4, 2020, 7:18pm

There are some settings in preferences that affect this …maybe try some of those…

Mail](https://go.microsoft.com/fwlink/?LinkId=550986) for Windows 10

pk5dark · February 4, 2020, 7:26pm

Should be more than fast enough. I guess darktable isn’t using the Nvidia GPU.
It would be interessting how device priorities are set.
Like:

Or: https://www.darktable.org/usermanual/en/darktable_and_opencl_multiple_devices.html