darktable 3.2.1 from Ubuntu repo, with default config:
153.188349 [dev] took 0.000 secs (0.000 CPU) to load the image.
153.388066 [export] creating pixelpipe took 0.188 secs (0.192 CPU)
153.388284 [pixelpipe_process] [export] using device 0
153.462397 [dev_pixelpipe] took 0.072 secs (0.044 CPU) initing base buffer [export]
153.495356 [dev_pixelpipe] took 0.033 secs (0.027 CPU) processed `raw black/white point' on GPU, blended on GPU [export]
153.499319 [dev_pixelpipe] took 0.004 secs (0.003 CPU) processed `white balance' on GPU, blended on GPU [export]
153.505446 [dev_pixelpipe] took 0.006 secs (0.005 CPU) processed `highlight reconstruction' on GPU, blended on GPU [export]
158.154059 [dev_pixelpipe] took 4.648 secs (6.323 CPU) processed `demosaic' on CPU with tiling, blended on CPU [export]
160.856374 [dev_pixelpipe] took 2.700 secs (2.411 CPU) processed `lens correction' on GPU, blended on GPU [export]
160.869648 [dev_pixelpipe] took 0.013 secs (0.010 CPU) processed `exposure' on GPU, blended on GPU [export]
178.273601 [dev_pixelpipe] took **17.403 secs (30.859 CPU) processed `tone equalizer' on CPU**, blended on CPU [export]
178.476608 [dev_pixelpipe] took 0.202 secs (0.182 CPU) processed `input color profile' on GPU, blended on GPU [export]
181.268073 [dev_pixelpipe] took **2.791 secs (2.551 CPU) processed `denoise (non-local means)' on GPU**, blended on GPU [export]
181.725964 [dev_pixelpipe] took **0.457 secs (0.332 CPU) processed `contrast equalizer' on GPU**, blended on GPU [export]
181.914487 [dev_pixelpipe] took 0.188 secs (0.056 CPU) processed `local contrast' on GPU, blended on GPU [export]
181.940378 [dev_pixelpipe] took 0.025 secs (0.016 CPU) processed `output color profile' on GPU, blended on GPU [export]
182.546787 [dev_pixelpipe] took 0.606 secs (0.863 CPU) processed `display encoding' on CPU, blended on CPU [export]
Now, since all the problematic modules are now performing much closer to the others’ measurements.
Now: 178.273601 [dev_pixelpipe] took 17.403 secs (30.859 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]
Previously: 41.817558 [dev_pixelpipe] took 0.300 secs (0.421 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]
Now: 181.268073 [dev_pixelpipe] took 2.791 secs (2.551 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]
Previously: 41.882531 [dev_pixelpipe] took 0.045 secs (0.022 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]
Now: 181.725964 [dev_pixelpipe] took 0.457 secs (0.332 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]
Previously: 41.908088 [dev_pixelpipe] took 0.026 secs (0.000 CPU) processed `contrast equalizer’ on GPU, blended on GPU [export]
So, either the developers have speeded up things 30-50x, or my previous measurement was in error (I don’t think that self-compiled vs. generic binary from the distro would cause such huge differences, especially with the GPU codepath).
I’ll re-do the master-branch measurement later. Now time to read a bedtime story.