Performance regression DT 4.4?

JUst upgraded my (Fedora 37) install to DT4.4 which came out over holidays. I have quickly opened it to verify it “works” by opening some past edits and all looks well. I also benchmarked it as I do for any new kernel and/or DT version using the phoronics benchmarks (with the inspiring foto’s of bench.SRW, masskrug.NEF and server_room.NEF – I have been doing this since DT 3.2.1 (using kernel 5.8.13).

Conclusion: DT 4.4 runs ~60% slower on the ‘bench’ image, but 6-13% faster on the other two benchmark images. Now … these benchmarks and images are not too representative of today’s processing (images are 18-24 Mpix, all use display referred), but a comparison using my own test image (42 Mpx, developed with Filmic also shows close to 10% longer execution time (I have included this as of DT 4.01 / kernel 6.0.11). All of this benchmarking is on current Fedora versions (not necessarily the same over this time interval), w/o significant background activity and take averages over 20 runs after discarding the 2 slowest and fastest as outliers. Hardware has been the same – i3-8100 with 8 or 16 Gb RAM and using SSD to load images from and RAM to write them to…

Anybody seen something similar or.

Hi @pindakoe,

No, on the contrary – up here the new version is much faster.
dt 3.2 (in August 2020) ran for 2.785/8.373 seconds
dt 4.4.1 (now) ran for 1.532/3.807 seconds
(with openCL/without).
Naturally, the Nvidia drivers were not the same, neither were the linux kernels.

Have fun!
Claes in Lund, Sweden

Indeed, on the contrary. In my testing 4.4 is faster both with CPU and GPU (OpenCL).

You can run something like this with 4.2 and 4.4 and see where (which modules) the difference is:

darktable-cli image.raw image.raw.xmp test.jpg --core -d perf

1 Like

Do you know the build process of your benchmarked dt version? Release? Recent gcc?

Large parts of the code have been overhauled removing sse2 specific code and trusting (for very good reasons = benchmarking) compiler optimized code.

You should check that carefully. Also F37 isn’t the latest gcc, v12 has further improved perf.

DT has an internal benchmark now that can also be used for testing…

README.txt (5.6 KB)

Thanks for the responses. To clarify (and respond to sugarbravo): my comparison was between 4.2.0 and 4.4.0; comparing with 3.2.1 indeed shows a massive improvement: 3.2.1 took 66-72 s whereas 4.4.0 takes 39s. The values for the versions in between were 59, 37, 35, 34, 34 (3.4, 3.6, 3.8, 4.0, 4.2). @hannoschwalm: I will do some digging (and asking at Fedora) whether build options have changed.

I will continue the benchmarking with new kernel versions coming out (typically once a week) and when 4.4.1 drops (4.4.0. only became available 2 weeks ago).

I don’t think it’s worth to check for kernel related performance.

The data shows you are right, but I did not know that when starting (more triggered by the then new spectre/meltdown vulnerabilities which triggered me to compare with/without; short version: 1-2% difference at most). I since run this out of habit, to have a larger baseline with changes like major version changes of my distribution or DT.

Lastly: I probably wasn’t fully awake when typing the first post – I am on Fedora 38 (for both versions); I have in the meantime als verified that DT4.40. was built with gcc 13.1.1; DT4.2.1 was built with 13.0.1. The search continues (at a low priority as I found another much more annoying issue)

I am also developing on Fedora 38 and have absolutely no issues…