Performance issues

I was of the same thinking, but then I run his image on my computer and…

0.652404 [dev] took 0.190 secs (0.202 CPU) to load the image.
0.667288 [export] creating pixelpipe took 0.011 secs (0.055 CPU)
0.693795 [dev_pixelpipe] took 0.026 secs (0.054 CPU) initing base buffer [export]
0.713126 [dev_pixelpipe] took 0.019 secs (0.073 CPU) processed `raw black/white point' on CPU, blended on CPU [export]
0.730959 [dev_pixelpipe] took 0.018 secs (0.070 CPU) processed `white balance' on CPU, blended on CPU [export]
0.791050 [dev_pixelpipe] took 0.060 secs (0.446 CPU) processed `highlight reconstruction' on CPU, blended on CPU [export]
1.332629 [dev_pixelpipe] took 0.541 secs (3.466 CPU) processed `demosaic' on CPU, blended on CPU [export]
3.193797 [dev_pixelpipe] took 1.861 secs (12.567 CPU) processed `lens correction' on CPU, blended on CPU [export]
3.252554 [dev_pixelpipe] took 0.059 secs (0.398 CPU) processed `exposure' on CPU, blended on CPU [export]
9.034613 [dev_pixelpipe] took 5.782 secs (40.261 CPU) processed `tone equalizer' on CPU, blended on CPU [export]
9.094097 [dev_pixelpipe] took 0.059 secs (0.458 CPU) processed `input color profile' on CPU, blended on CPU [export]
32.446871 [dev_pixelpipe] took 23.353 secs (175.435 CPU) processed `denoise (non-local means)' on CPU, blended on CPU [export]
38.913880 [dev_pixelpipe] took 6.466 secs (42.651 CPU) processed `contrast equalizer' on CPU, blended on CPU [export]
40.045200 [dev_pixelpipe] took 1.131 secs (7.446 CPU) processed `local contrast' on CPU, blended on CPU [export]
41.577268 [dev_pixelpipe] took 1.532 secs (11.920 CPU) processed `output color profile' on CPU, blended on CPU [export]
41.652031 [dev_pixelpipe] took 0.075 secs (0.594 CPU) processed `display encoding' on CPU, blended on CPU [export]
41.652139 [dev_process_export] pixel pipeline processing took 40.985 secs (295.866 CPU)

This is without OpenCL because I have the GPU disabled at the moment, but still I think it points to something else going on here:

guille2306 (Intel i7-4720HQ @ 3.600GHz with 16 GB RAM): 9.034613 [dev_pixelpipe] took 5.782 secs (40.261 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

This is on Linux and I would say more or less in line with the expected difference between the i7-4510U and the i7-4720HQ with multi-threading enabled. I still don’t understand why such a simple edit is so slow, but that’s two completely different systems stumping on the same image.

Maybe the outliers here are @kofa’s results? @kofa: I see your’re on a development version of darktable. Can you downgrade and test in 3.2.1?

Its not really that scary, and the HP laptop BIOS is rather “dumbed down”, so wont let you disable both graphic card etc. Looks like your laptop doesn’t have the option to disable the IGPU though, but Id still boot into bios and check

Performance problem can be rather elusive. People get used to how their computer behave, and things like Browsing, Email, video etc will probably work just fine as its not resource intensive. But when running a heavy game, heavy graphic task or benchmarks you will notice it.

Check your event viewer, device manager, task manager etc. Also look for any forms of power saving that can be pathological on laptops. You may be facing CPU clockdowns etc. Ive seen that before.

I also personally never go more than a couple of years without a complete wipe/format and windows reinstall. Windows gets clogged up over time.

Yes by golly your right. I should have seen your Denoise numbers.

Ran on my work laptop now, also win10. Its the Denoise module seem to be the culprit on my laptop at least

67.277793 [dev_process_export] pixel pipeline processing took 31.775 secs (221.422 CPU)

Disabled tone EQ
175.243276 [dev_process_export] pixel pipeline processing took 29.271 secs (201.422 CPU)

Disabled Contr EQ
257.730763 [dev_process_export] pixel pipeline processing took 24.653 secs (167.328 CPU)

Disabled Denoise
312.363786 [dev_process_export] pixel pipeline processing took 5.653 secs (30.609 CPU)

Re-enabled denoise
516.443039 [dev_pixelpipe] took 27.292 secs (199.094 CPU) processed `denoise (profiled)’ on CPU, blended on CPU [export]
520.230905 [dev_process_export] pixel pipeline processing took 31.841 secs (227.516 CPU)

1 Like

This is truly bizarre. I use Windows myself on a few years old Dell laptop - Nvidia 1060. Also on Macbook pro with Radeon card (just to be clear I’m running Windows too in this Macbook). Both have no problem at all processing images with DT. No problem whatsoever using tone equalizer or not.

Sorry I don’t quite understand the numbers posted here, but seems like you are getting few hours rather than few seconds of processing time? There must be something really odd in the Windows itself.

When running this test, does your CPU goes up for the entire process? What about memory usage? I’m suspecting something on your Windows holding DT from using the CPU. Can you check the CPU temperature too? Usually the laptop slows down when CPU is too hot.

Also, do you have problem with other application that also use heavy resources?

To try compare i took a 90mb 48mp shot and tested

No Denoise:
1005.994083 [dev_process_export] pixel pipeline processing took 4.869 secs (33.891 CPU)

Profiled denoise for this image
1163.342722 [dev_process_export] pixel pipeline processing took 65.728 secs (480.953 CPU)

Changed to Wavelets Auto
1339.393436 [dev_process_export] pixel pipeline processing took 23.275 secs (144.469 CPU)

Copy over the Denoise from

So Wavelets better but

Yikes!

Never noticed this before. Id be curuious to see comparison with older DT. Or is this just expected from Denoise Profiled?

Booting up my main PC with linux now

Sorry for the spams, running high on caffeine today.

I just ran it now on my main PC with Manjaro & DT 3.2.1
34,848216 [dev_process_export] pixel pipeline processing took 3,783 secs (18,349 CPU)

EDIT, now disabled opencl, and i get this on linux with a ryzen 3900X
16,779625 [dev_process_export] pixel pipeline processing took 12,195 secs (262,122 CPU)

I suspect thats comparable to OBEs CPU. So prob just simply that Denoise is much slower with CPU?

Fast in GPU, very slow on CPU :frowning:

But now looking at OBEs original log,
58,275698 [dev_pixelpipe] took 18,082 secs (2,172 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

Also also tone Eq is slow for him.
39,631271 [dev_pixelpipe] took 21,260 secs (74,266 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

While i see:
6,981280 [dev_pixelpipe] took 0,698 secs (12,117 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

Are we just looking at simple hardware performance, and CPU vs GPU on denoise?

Only thing that’s off for me now is the massive difference in Denoise, CPU vs GPU. And the Tone Eq for OB. (back to check you PC for CPU throttling or bottlenecks?)

Ok, i need to get back to work now :slight_smile:

I’d think that 6:1 CPU hardware differences are surprising, but not impossible. For the same operation, on dated hardware (but different darktable version) I get:

kofa (Intel Core2 Duo @ 2.33 GHz with 4 GB RAM): 41.817558 [dev_pixelpipe] took 0.300 secs (0.421 CPU) processed `tone equalizer’ on CPU , blended on CPU [export]

A 30x speed-up due to software optimisation is also surprising, but can happen. I’ll try 3.2.1 in the evening, if I have time.

Also of note is that both @obe and me are running notebooks, and it seems both @kofa and @Tore_Valberg are running desktops.

Maybe it’s throttling after all? But then why it only affects tone equalizer? And a 6x or 30x speed-up? Even when my CPU throttles, clock speed only reduces at most by 40%, and the number of threads running is still 8.

darktable 3.2.1 from Ubuntu repo, with default config:

153.188349 [dev] took 0.000 secs (0.000 CPU) to load the image.
153.388066 [export] creating pixelpipe took 0.188 secs (0.192 CPU)
153.388284 [pixelpipe_process] [export] using device 0
153.462397 [dev_pixelpipe] took 0.072 secs (0.044 CPU) initing base buffer [export]
153.495356 [dev_pixelpipe] took 0.033 secs (0.027 CPU) processed `raw black/white point' on GPU, blended on GPU [export]
153.499319 [dev_pixelpipe] took 0.004 secs (0.003 CPU) processed `white balance' on GPU, blended on GPU [export]
153.505446 [dev_pixelpipe] took 0.006 secs (0.005 CPU) processed `highlight reconstruction' on GPU, blended on GPU [export]
158.154059 [dev_pixelpipe] took 4.648 secs (6.323 CPU) processed `demosaic' on CPU with tiling, blended on CPU [export]
160.856374 [dev_pixelpipe] took 2.700 secs (2.411 CPU) processed `lens correction' on GPU, blended on GPU [export]
160.869648 [dev_pixelpipe] took 0.013 secs (0.010 CPU) processed `exposure' on GPU, blended on GPU [export]
178.273601 [dev_pixelpipe] took **17.403 secs (30.859 CPU) processed `tone equalizer' on CPU**, blended on CPU [export]
178.476608 [dev_pixelpipe] took 0.202 secs (0.182 CPU) processed `input color profile' on GPU, blended on GPU [export]
181.268073 [dev_pixelpipe] took **2.791 secs (2.551 CPU) processed `denoise (non-local means)' on GPU**, blended on GPU [export]
181.725964 [dev_pixelpipe] took **0.457 secs (0.332 CPU) processed `contrast equalizer' on GPU**, blended on GPU [export]
181.914487 [dev_pixelpipe] took 0.188 secs (0.056 CPU) processed `local contrast' on GPU, blended on GPU [export]
181.940378 [dev_pixelpipe] took 0.025 secs (0.016 CPU) processed `output color profile' on GPU, blended on GPU [export]
182.546787 [dev_pixelpipe] took 0.606 secs (0.863 CPU) processed `display encoding' on CPU, blended on CPU [export]

Now, since all the problematic modules are now performing much closer to the others’ measurements.

Now: 178.273601 [dev_pixelpipe] took 17.403 secs (30.859 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

Previously: 41.817558 [dev_pixelpipe] took 0.300 secs (0.421 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

Now: 181.268073 [dev_pixelpipe] took 2.791 secs (2.551 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

Previously: 41.882531 [dev_pixelpipe] took 0.045 secs (0.022 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

Now: 181.725964 [dev_pixelpipe] took 0.457 secs (0.332 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]

Previously: 41.908088 [dev_pixelpipe] took 0.026 secs (0.000 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]

So, either the developers have speeded up things 30-50x, or my previous measurement was in error (I don’t think that self-compiled vs. generic binary from the distro would cause such huge differences, especially with the GPU codepath).

I’ll re-do the master-branch measurement later. Now time to read a bedtime story. :slight_smile:

So, with the current master version, I got similar measurements as with 3.2.1.
I wonder what happened yesterday, I did not invent those numbers, and did not run on a supercomputer. :slight_smile:

63.431303 [dev_pixelpipe] took 8.919 secs (15.673 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

66.434902 [dev_pixelpipe] took 2.810 secs (2.602 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

66.887540 [dev_pixelpipe] took 0.453 secs (0.391 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]

Oh, I have made a fool of myself. :smiley: Sorry about that.
Turns out, I had left my export module accidentally on full-HD output, 1920x1080. I didn’t know, but it seems that darktable does not process the image at full resolution in this case: downscaling occurs early in the pipe, with some modules’ size-related parameters being appropriately reduced as well.

1 Like

There is a setting for high quality exports (processing at full resolution and downscale at the end).

That’s good to know! Don’t worry, errors happen :slight_smile:
Mystery solved then? I guess it’s just that ‘denoise (non-local means)’ and ‘tone equalizer’ are very heavy on the CPU, and scale badly with the size of the image.

Great news. Thank you.

As I understand your post, you set max size 1920x1080 in the export selected menu under global options. This setting runs very, very fast on my pc, and since I export all my photos to Google Photos, I suppose 1920x1080 is ok.

The result was: processing took 3,172 secs (7,078 CPU) and no events lost
So no atrous problem in this export.

I installed darktable on my old HP Pavilion Slimline and exported the image with default settings. This ran also slowly of course, and I got this error: “Tiling failed for module atrous. Output may be garbled.”

Apparently, the default DT memory settings should by adjusted to fit the most recent.
development.

Do you mean the default export setting: max size 0x0?

Hi’ @Tore_Valberg

Thank you for your help and input…….:+1::+1::+1:. Nothing much to be gained from further tuning and investigations?

Always good to check and clean out a bit, Btw, Norsk?

Dansk…:grinning:!

No, sorry, I meant high quality resampling.

Hi’ @anon41087856

I just tested the new 3.4 release editing this photo (the raw file can be found earlier in this thread):

The tone equalizer performance has changed from:
21,260 secs (74,266 CPU)………to………4,323 secs (15,375 CPU) !!!

Great :+1::+1::+1:……thank you very much!

4 Likes