Performance issues

Yes by golly your right. I should have seen your Denoise numbers.

Ran on my work laptop now, also win10. Its the Denoise module seem to be the culprit on my laptop at least

67.277793 [dev_process_export] pixel pipeline processing took 31.775 secs (221.422 CPU)

Disabled tone EQ
175.243276 [dev_process_export] pixel pipeline processing took 29.271 secs (201.422 CPU)

Disabled Contr EQ
257.730763 [dev_process_export] pixel pipeline processing took 24.653 secs (167.328 CPU)

Disabled Denoise
312.363786 [dev_process_export] pixel pipeline processing took 5.653 secs (30.609 CPU)

Re-enabled denoise
516.443039 [dev_pixelpipe] took 27.292 secs (199.094 CPU) processed `denoise (profiled)’ on CPU, blended on CPU [export]
520.230905 [dev_process_export] pixel pipeline processing took 31.841 secs (227.516 CPU)

1 Like

This is truly bizarre. I use Windows myself on a few years old Dell laptop - Nvidia 1060. Also on Macbook pro with Radeon card (just to be clear I’m running Windows too in this Macbook). Both have no problem at all processing images with DT. No problem whatsoever using tone equalizer or not.

Sorry I don’t quite understand the numbers posted here, but seems like you are getting few hours rather than few seconds of processing time? There must be something really odd in the Windows itself.

When running this test, does your CPU goes up for the entire process? What about memory usage? I’m suspecting something on your Windows holding DT from using the CPU. Can you check the CPU temperature too? Usually the laptop slows down when CPU is too hot.

Also, do you have problem with other application that also use heavy resources?

To try compare i took a 90mb 48mp shot and tested

No Denoise:
1005.994083 [dev_process_export] pixel pipeline processing took 4.869 secs (33.891 CPU)

Profiled denoise for this image
1163.342722 [dev_process_export] pixel pipeline processing took 65.728 secs (480.953 CPU)

Changed to Wavelets Auto
1339.393436 [dev_process_export] pixel pipeline processing took 23.275 secs (144.469 CPU)

Copy over the Denoise from

So Wavelets better but

Yikes!

Never noticed this before. Id be curuious to see comparison with older DT. Or is this just expected from Denoise Profiled?

Booting up my main PC with linux now

Sorry for the spams, running high on caffeine today.

I just ran it now on my main PC with Manjaro & DT 3.2.1
34,848216 [dev_process_export] pixel pipeline processing took 3,783 secs (18,349 CPU)

EDIT, now disabled opencl, and i get this on linux with a ryzen 3900X
16,779625 [dev_process_export] pixel pipeline processing took 12,195 secs (262,122 CPU)

I suspect thats comparable to OBEs CPU. So prob just simply that Denoise is much slower with CPU?

Fast in GPU, very slow on CPU :frowning:

But now looking at OBEs original log,
58,275698 [dev_pixelpipe] took 18,082 secs (2,172 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

Also also tone Eq is slow for him.
39,631271 [dev_pixelpipe] took 21,260 secs (74,266 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

While i see:
6,981280 [dev_pixelpipe] took 0,698 secs (12,117 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

Are we just looking at simple hardware performance, and CPU vs GPU on denoise?

Only thing that’s off for me now is the massive difference in Denoise, CPU vs GPU. And the Tone Eq for OB. (back to check you PC for CPU throttling or bottlenecks?)

Ok, i need to get back to work now :slight_smile:

I’d think that 6:1 CPU hardware differences are surprising, but not impossible. For the same operation, on dated hardware (but different darktable version) I get:

kofa (Intel Core2 Duo @ 2.33 GHz with 4 GB RAM): 41.817558 [dev_pixelpipe] took 0.300 secs (0.421 CPU) processed `tone equalizer’ on CPU , blended on CPU [export]

A 30x speed-up due to software optimisation is also surprising, but can happen. I’ll try 3.2.1 in the evening, if I have time.

Also of note is that both @obe and me are running notebooks, and it seems both @kofa and @Tore_Valberg are running desktops.

Maybe it’s throttling after all? But then why it only affects tone equalizer? And a 6x or 30x speed-up? Even when my CPU throttles, clock speed only reduces at most by 40%, and the number of threads running is still 8.

darktable 3.2.1 from Ubuntu repo, with default config:

153.188349 [dev] took 0.000 secs (0.000 CPU) to load the image.
153.388066 [export] creating pixelpipe took 0.188 secs (0.192 CPU)
153.388284 [pixelpipe_process] [export] using device 0
153.462397 [dev_pixelpipe] took 0.072 secs (0.044 CPU) initing base buffer [export]
153.495356 [dev_pixelpipe] took 0.033 secs (0.027 CPU) processed `raw black/white point' on GPU, blended on GPU [export]
153.499319 [dev_pixelpipe] took 0.004 secs (0.003 CPU) processed `white balance' on GPU, blended on GPU [export]
153.505446 [dev_pixelpipe] took 0.006 secs (0.005 CPU) processed `highlight reconstruction' on GPU, blended on GPU [export]
158.154059 [dev_pixelpipe] took 4.648 secs (6.323 CPU) processed `demosaic' on CPU with tiling, blended on CPU [export]
160.856374 [dev_pixelpipe] took 2.700 secs (2.411 CPU) processed `lens correction' on GPU, blended on GPU [export]
160.869648 [dev_pixelpipe] took 0.013 secs (0.010 CPU) processed `exposure' on GPU, blended on GPU [export]
178.273601 [dev_pixelpipe] took **17.403 secs (30.859 CPU) processed `tone equalizer' on CPU**, blended on CPU [export]
178.476608 [dev_pixelpipe] took 0.202 secs (0.182 CPU) processed `input color profile' on GPU, blended on GPU [export]
181.268073 [dev_pixelpipe] took **2.791 secs (2.551 CPU) processed `denoise (non-local means)' on GPU**, blended on GPU [export]
181.725964 [dev_pixelpipe] took **0.457 secs (0.332 CPU) processed `contrast equalizer' on GPU**, blended on GPU [export]
181.914487 [dev_pixelpipe] took 0.188 secs (0.056 CPU) processed `local contrast' on GPU, blended on GPU [export]
181.940378 [dev_pixelpipe] took 0.025 secs (0.016 CPU) processed `output color profile' on GPU, blended on GPU [export]
182.546787 [dev_pixelpipe] took 0.606 secs (0.863 CPU) processed `display encoding' on CPU, blended on CPU [export]

Now, since all the problematic modules are now performing much closer to the others’ measurements.

Now: 178.273601 [dev_pixelpipe] took 17.403 secs (30.859 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

Previously: 41.817558 [dev_pixelpipe] took 0.300 secs (0.421 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

Now: 181.268073 [dev_pixelpipe] took 2.791 secs (2.551 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

Previously: 41.882531 [dev_pixelpipe] took 0.045 secs (0.022 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

Now: 181.725964 [dev_pixelpipe] took 0.457 secs (0.332 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]

Previously: 41.908088 [dev_pixelpipe] took 0.026 secs (0.000 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]

So, either the developers have speeded up things 30-50x, or my previous measurement was in error (I don’t think that self-compiled vs. generic binary from the distro would cause such huge differences, especially with the GPU codepath).

I’ll re-do the master-branch measurement later. Now time to read a bedtime story. :slight_smile:

So, with the current master version, I got similar measurements as with 3.2.1.
I wonder what happened yesterday, I did not invent those numbers, and did not run on a supercomputer. :slight_smile:

63.431303 [dev_pixelpipe] took 8.919 secs (15.673 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

66.434902 [dev_pixelpipe] took 2.810 secs (2.602 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

66.887540 [dev_pixelpipe] took 0.453 secs (0.391 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]

Oh, I have made a fool of myself. :smiley: Sorry about that.
Turns out, I had left my export module accidentally on full-HD output, 1920x1080. I didn’t know, but it seems that darktable does not process the image at full resolution in this case: downscaling occurs early in the pipe, with some modules’ size-related parameters being appropriately reduced as well.

1 Like

There is a setting for high quality exports (processing at full resolution and downscale at the end).

That’s good to know! Don’t worry, errors happen :slight_smile:
Mystery solved then? I guess it’s just that ‘denoise (non-local means)’ and ‘tone equalizer’ are very heavy on the CPU, and scale badly with the size of the image.

Great news. Thank you.

As I understand your post, you set max size 1920x1080 in the export selected menu under global options. This setting runs very, very fast on my pc, and since I export all my photos to Google Photos, I suppose 1920x1080 is ok.

The result was: processing took 3,172 secs (7,078 CPU) and no events lost
So no atrous problem in this export.

I installed darktable on my old HP Pavilion Slimline and exported the image with default settings. This ran also slowly of course, and I got this error: “Tiling failed for module atrous. Output may be garbled.”

Apparently, the default DT memory settings should by adjusted to fit the most recent.
development.

Do you mean the default export setting: max size 0x0?

Hi’ @Tore_Valberg

Thank you for your help and input…….:+1::+1::+1:. Nothing much to be gained from further tuning and investigations?

Always good to check and clean out a bit, Btw, Norsk?

Dansk…:grinning:!

No, sorry, I meant high quality resampling.

Hi’ @anon41087856

I just tested the new 3.4 release editing this photo (the raw file can be found earlier in this thread):

The tone equalizer performance has changed from:
21,260 secs (74,266 CPU)………to………4,323 secs (15,375 CPU) !!!

Great :+1::+1::+1:……thank you very much!

4 Likes

I’ts @rawfiner you should thank, not me :wink:

6 Likes

Hi’ @rawfiner

Thank you for the great performance improvement of tone equalizer in dt 3.4…:grinning::grinning:

1 Like