Hi, I am experiencing a performance regression with darktable with newer kernels. The LTS kernels 6.6.x are fine, but starting with the 6.7 series up to the most recent kernels I see a performance regression of almost 25 %.
I have addressed this to the darktable developers but they do not see this as a darktable issue but rather a kernel issue and are not willing to invest time in any investigation. The issue Ihave created for this topic has been closed: Performance regression with atrous module and newer kernels (e.g. 6.7.12 or 6.10.6) compared to 6.6.47 ¡ Issue #17397 ¡ darktable-org/darktable ¡ GitHub
In that issue it was suggested that I start a thread here in this forum. So here I am. I want to address this topic also to the kernel developers. But before I do so I am seeking some confirmation that I am not alone with this performance regression.
Here are my findings in a nutshell.
I do a raw conversion on the commandline with opencl disabled. Everything is on the CPU. I check the debug output for the time darktable spent in the pixel pipeline:
darktable-cli bench.SRW /tmp/test.jpg --core --disable-opencl -d perf -d opencl --configdir /tmp
The revelant output line looks like this:
4,2765 [dev_process_export] pixel pipeline processing took 3,811 secs (81,883 CPU)
When I do this benchmark with different kernels I find that with kernel 6.7 and newer the pixelpipline is roughly 25 % slower than with kernel 6.6 and older (I also tested with kernel 6.5)
The main contributor seems to be the module called âatrousâ. It wastes most of the time with newer kernels:
with kernel 6.6.47:
4,0548 [dev_pixelpipe] took 0,635 secs (14,597 CPU) [export] processed 'atrous' on CPU, blended on CPU
...
4,2765 [dev_process_export] pixel pipeline processing took 3,811 secs (81,883 CPU)
with kernel 6.10.6:
4,9645 [dev_pixelpipe] took 1,489 secs (33,736 CPU) [export] processed 'atrous' on CPU, blended on CPU
...
5,2151 [dev_process_export] pixel pipeline processing took 4,773 secs (102,452 CPU)
This example shows that the atrous module accounts for all the performance drop. Overall conversion time goes from 3.8 s to 4.7 s on my PC. That is significant.
Does anybody else experience a similar performance drop with kernels 6.7+?