RT significantly slower on Windows than on Linux

Some possible causes (since NVidia drivers are rather unlikely to influence a CPU-bound application):

  • Maybe the Windows and the Linux build environment / scripts use different compilers, or at least different flags?
  • Windows may have more overhead, as it usually has some anti-virus running (Windows Defender is built in).
  • Self-compiled binaries (optimised for the user’s CPU) may also be faster than generic versions.

@kofa
Latest clockings:

D: darktable version
E: seconds with openCL
F: seconds without openCL
G: distro
H: nvidia driver version

clockings

Have fun!
Claes in Lund, Sweden

I see no normal reason why it would be slower on windows vs Linux. Maybe compiler differences or parameters used , or differences in threading libraries , etc…

But most programs that calculate stuff (like a ffmpeg cpu encode dor instance ) are not faster on Linux compared to macOS or windows. Or it’s at least in the margins of error area.

Now, i do know that windows is slower in how it handles process starts . Starting a process is more involved in windows and takes longer. So things like big compiles - that basically spawn a gcc process that finishes quickly and then quickly need to spawn another - are way faster on Linux (or non-windows os, let’s say it like that :wink: ).

Could it be that rawtherapee spawns external processes on opening files , maybe exiftool or something to get some information? Because that could be an explanation for the longer times.

Native Linux is quite some time ago for me , so i can’t compare my modern windows rawtherapee to 'what i used to feel back in the day '… But it still feels snappy for me .

@Claes: There are too many factors changing at once.
Look at 4.1.0+391 on Manjaro vs 4.1.0+387 on Kubuntu, and compare the rows using the same NVidia version without OpenCL (so, fix the NVidia version, and for each fixed version, compare the relative performances of Linux + darktable versions):

NVidia 470: 4.392 vs 4.102, or 1.07:1 (caused by changing distro + dt version)
NVidia 515: 4.309 vs 4.095, or 1.05:1 (caused by changing distro + dt version)

Now, if you fix the distro and darktable version, and vary the NVidia driver version, you get:
Kubuntu + 4.1.0+387: 4.102 vs 4.075 (slowest/fastest runs): 1.007:1 (caused by NVidia version change)
Manjaro + 4.1.0+391: 4.392 vs 4.309: 1.019:1 (caused by NVidia version change)

To me, that looks like the Linux + dt version means much more difference (5-7%) than the NVidia version (0.7 - 1.9%). Actually, a measurement difference in the range of 1% could easily be ‘noise’ (caused by other software running during the measurement, like cron jobs and the like).

Plus, if you really want to benchmark, export a given set of pictures to produce longer runs and reduce measurement noise. Be sure to clear the darktable mipmap cache before each batch export, and use the same configuration, of course. And vary one parameter at a time.

This issue may explain some of the performance difference. There is a problem with recent versions of the GCC compiler that we work around by disabling an optimization. That results in a performance drop in some places (capture sharpening, for example). The Windows build uses an up-to-date version of GCC (currently 12.2.0) to compile RawTherapee and is affected. The AppImage build process runs on Ubuntu 18.04 which has the much older GCC 7.5.0 and is not affected.

2 Likes

Apart from the issue Lawrence linked to, this also might be reason why the file browser can be slower. Processing and adjusting sliders shouldn’t be affected by this.

I will compile on Arch in a minute. Arch shoul use the newest compiler.

Just built RT on Garuda. Slower than the Appimage but not nearly as slow as Windows. Jumping to the next raw takes approximately 1.85 sec.
It’s clear that it’s capture sharpening that slows down the process (the c.sh. bar is slower in the bottom left corner).

Going to compile on Windows now.

RT compiled on Windows is as fast as compiled on Arch/Garuda

1 Like

Good job with the investigation, Anna!

1 Like

I see the devs have already done something:
Set -ffp-contract=off compiler flag for GCC >= 11 (#6384) by Lawrence37 · Pull Request #6583 · Beep6581/RawTherapee · GitHub - I guess that’s what I have tested now? So if I had tested it yesterday it would have been slower…

edit: apparently it’s not merged yet

@Lawrence37 's fork seems to be a little faster than dev on Windows, maybe even faster than the Appimage, jumping to the next raw takes 1.5-1.6 secs, maybe sometimes 1.4
Going to compile on Arch…

by the way, there is a mistake in the compiling instructions for Windows, it should be git clone https:// not git clone git://

just compiled @Lawrence37 's version, seems to be as fast as the Appimage

@Lawrence37 @Thanatomanic I’ve also tested the windows binary from Github, seems to be fixed now.

2 Likes

I am a Windows user.
I kept using version 5.8 for a long time, then (when it was clear that version 5.9 was not coming fast) I installed version 5.8-3089-g274c99e9b just to test the new features.
BTW it was april 2022.

The program has a lot of new features but, on my rather old computer, it is much slower and a lot less responsive.
Even simply keeping pressed the mouse over a + or - button near a slider does not change the values in a smooth way but the numbers stop periodically, just as the program needs to rest.
This happens even if I do not use any of the new features, but only the ones that were available in version 5.8 and this looks strange to me.

Isn’t this just a fix for a certain GCC version… if it’s windows / linux / macos… there is no platform specific thing in here? Or am I misreading something.

the issue was also present if RT was compiled on newer Linux systems with a newer gcc, I think just some new feature in the new gcc was switched off
but I don’t understand the exact technical details
I mean RT was faster on Linux because the Appimage is/was compiled on a really old Linux

I’ve found that compiling on Windows yeilds a noticeable improvement over the binary distribution.

Likely due to the optimizations native to my CPU, but but at the very least its a placebo level improvement :slight_smile:

The same goes in my experience. I’ve even found that some stuff works smoother and quicker when running on Arch Linux via Virtualbox hosted on macOS than it does by running on macOS directly (don’t ask me how that’s even possible; I just put it down to Linux being better :wink:).

1 Like

I think what we Windows users are seeing is that the development and pre-release builds of RT that are available on github are being compiled with “generic” features that will/should run on nearly any version of Intel processor. But a generic version doesn’t enable the optimizations available in more recent versions of the processors (Skylake, Haswell, Alderlake, etc.) so the result is a much slower and jerkier performance.