Performance issues

I’d think that 6:1 CPU hardware differences are surprising, but not impossible. For the same operation, on dated hardware (but different darktable version) I get:

kofa (Intel Core2 Duo @ 2.33 GHz with 4 GB RAM): 41.817558 [dev_pixelpipe] took 0.300 secs (0.421 CPU) processed `tone equalizer’ on CPU , blended on CPU [export]

A 30x speed-up due to software optimisation is also surprising, but can happen. I’ll try 3.2.1 in the evening, if I have time.

Also of note is that both @obe and me are running notebooks, and it seems both @kofa and @Tore_Valberg are running desktops.

Maybe it’s throttling after all? But then why it only affects tone equalizer? And a 6x or 30x speed-up? Even when my CPU throttles, clock speed only reduces at most by 40%, and the number of threads running is still 8.

darktable 3.2.1 from Ubuntu repo, with default config:

153.188349 [dev] took 0.000 secs (0.000 CPU) to load the image.
153.388066 [export] creating pixelpipe took 0.188 secs (0.192 CPU)
153.388284 [pixelpipe_process] [export] using device 0
153.462397 [dev_pixelpipe] took 0.072 secs (0.044 CPU) initing base buffer [export]
153.495356 [dev_pixelpipe] took 0.033 secs (0.027 CPU) processed `raw black/white point' on GPU, blended on GPU [export]
153.499319 [dev_pixelpipe] took 0.004 secs (0.003 CPU) processed `white balance' on GPU, blended on GPU [export]
153.505446 [dev_pixelpipe] took 0.006 secs (0.005 CPU) processed `highlight reconstruction' on GPU, blended on GPU [export]
158.154059 [dev_pixelpipe] took 4.648 secs (6.323 CPU) processed `demosaic' on CPU with tiling, blended on CPU [export]
160.856374 [dev_pixelpipe] took 2.700 secs (2.411 CPU) processed `lens correction' on GPU, blended on GPU [export]
160.869648 [dev_pixelpipe] took 0.013 secs (0.010 CPU) processed `exposure' on GPU, blended on GPU [export]
178.273601 [dev_pixelpipe] took **17.403 secs (30.859 CPU) processed `tone equalizer' on CPU**, blended on CPU [export]
178.476608 [dev_pixelpipe] took 0.202 secs (0.182 CPU) processed `input color profile' on GPU, blended on GPU [export]
181.268073 [dev_pixelpipe] took **2.791 secs (2.551 CPU) processed `denoise (non-local means)' on GPU**, blended on GPU [export]
181.725964 [dev_pixelpipe] took **0.457 secs (0.332 CPU) processed `contrast equalizer' on GPU**, blended on GPU [export]
181.914487 [dev_pixelpipe] took 0.188 secs (0.056 CPU) processed `local contrast' on GPU, blended on GPU [export]
181.940378 [dev_pixelpipe] took 0.025 secs (0.016 CPU) processed `output color profile' on GPU, blended on GPU [export]
182.546787 [dev_pixelpipe] took 0.606 secs (0.863 CPU) processed `display encoding' on CPU, blended on CPU [export]

Now, since all the problematic modules are now performing much closer to the others’ measurements.

Now: 178.273601 [dev_pixelpipe] took 17.403 secs (30.859 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

Previously: 41.817558 [dev_pixelpipe] took 0.300 secs (0.421 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

Now: 181.268073 [dev_pixelpipe] took 2.791 secs (2.551 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

Previously: 41.882531 [dev_pixelpipe] took 0.045 secs (0.022 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

Now: 181.725964 [dev_pixelpipe] took 0.457 secs (0.332 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]

Previously: 41.908088 [dev_pixelpipe] took 0.026 secs (0.000 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]

So, either the developers have speeded up things 30-50x, or my previous measurement was in error (I don’t think that self-compiled vs. generic binary from the distro would cause such huge differences, especially with the GPU codepath).

I’ll re-do the master-branch measurement later. Now time to read a bedtime story. :slight_smile:

So, with the current master version, I got similar measurements as with 3.2.1.
I wonder what happened yesterday, I did not invent those numbers, and did not run on a supercomputer. :slight_smile:

63.431303 [dev_pixelpipe] took 8.919 secs (15.673 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

66.434902 [dev_pixelpipe] took 2.810 secs (2.602 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

66.887540 [dev_pixelpipe] took 0.453 secs (0.391 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]

Oh, I have made a fool of myself. :smiley: Sorry about that.
Turns out, I had left my export module accidentally on full-HD output, 1920x1080. I didn’t know, but it seems that darktable does not process the image at full resolution in this case: downscaling occurs early in the pipe, with some modules’ size-related parameters being appropriately reduced as well.

1 Like

There is a setting for high quality exports (processing at full resolution and downscale at the end).

That’s good to know! Don’t worry, errors happen :slight_smile:
Mystery solved then? I guess it’s just that ‘denoise (non-local means)’ and ‘tone equalizer’ are very heavy on the CPU, and scale badly with the size of the image.

Great news. Thank you.

As I understand your post, you set max size 1920x1080 in the export selected menu under global options. This setting runs very, very fast on my pc, and since I export all my photos to Google Photos, I suppose 1920x1080 is ok.

The result was: processing took 3,172 secs (7,078 CPU) and no events lost
So no atrous problem in this export.

I installed darktable on my old HP Pavilion Slimline and exported the image with default settings. This ran also slowly of course, and I got this error: “Tiling failed for module atrous. Output may be garbled.”

Apparently, the default DT memory settings should by adjusted to fit the most recent.
development.

Do you mean the default export setting: max size 0x0?

Hi’ @Tore_Valberg

Thank you for your help and input…….:+1::+1::+1:. Nothing much to be gained from further tuning and investigations?

Always good to check and clean out a bit, Btw, Norsk?

Dansk…:grinning:!

No, sorry, I meant high quality resampling.

Hi’ @anon41087856

I just tested the new 3.4 release editing this photo (the raw file can be found earlier in this thread):

The tone equalizer performance has changed from:
21,260 secs (74,266 CPU)………to………4,323 secs (15,375 CPU) !!!

Great :+1::+1::+1:……thank you very much!

4 Likes

I’ts @rawfiner you should thank, not me :wink:

6 Likes

Hi’ @rawfiner

Thank you for the great performance improvement of tone equalizer in dt 3.4…:grinning::grinning:

1 Like

Thanks! :smiley:
I am glad it is useful :slight_smile:

@rawfiner, man you are too humble. Your work is just excellent.

2 Likes