RT significantly slower on Windows than on Linux

In the link you provided, people are debating gaming performance.

RawTherapee, to the best of my knowledge, does not use the graphics card for accelerating computations.

RawTherapee doesn’t use openCL though :wink:

1 Like

That has been my experience too. Everything is slower on Windows. (Disclaimer: I guess gaming is supposed to be faster on Windows, but I’m not a gamer.)

1 Like

I can say with certainty that I am slower on Windows. :wink:

I am a Windows user and only dabble in Linux VMs. My experience with RT is that it’s certainly not terrible on Windows in terms of speed. However, if Anna is describing a significant effect, I am tempted to install a dual boot environment just to test things on exactly the same hardware.

@betazoid which versions of RT do you use, or do you compile it yourself (native) on both systems?
Edit: I read now you use the GitHub compiled builds. It would be interesting to see what difference a native build would provide.

I use the versions that can be downloaded from Github, latest version of the dev branch, not compliled, appimage resp. The exe installer.

I can compile, but I think I have not compiled RT on windows yet. Maybe tomorrow.

Some possible causes (since NVidia drivers are rather unlikely to influence a CPU-bound application):

  • Maybe the Windows and the Linux build environment / scripts use different compilers, or at least different flags?
  • Windows may have more overhead, as it usually has some anti-virus running (Windows Defender is built in).
  • Self-compiled binaries (optimised for the user’s CPU) may also be faster than generic versions.

@kofa
Latest clockings:

D: darktable version
E: seconds with openCL
F: seconds without openCL
G: distro
H: nvidia driver version

clockings

Have fun!
Claes in Lund, Sweden

I see no normal reason why it would be slower on windows vs Linux. Maybe compiler differences or parameters used , or differences in threading libraries , etc…

But most programs that calculate stuff (like a ffmpeg cpu encode dor instance ) are not faster on Linux compared to macOS or windows. Or it’s at least in the margins of error area.

Now, i do know that windows is slower in how it handles process starts . Starting a process is more involved in windows and takes longer. So things like big compiles - that basically spawn a gcc process that finishes quickly and then quickly need to spawn another - are way faster on Linux (or non-windows os, let’s say it like that :wink: ).

Could it be that rawtherapee spawns external processes on opening files , maybe exiftool or something to get some information? Because that could be an explanation for the longer times.

Native Linux is quite some time ago for me , so i can’t compare my modern windows rawtherapee to 'what i used to feel back in the day '… But it still feels snappy for me .

@Claes: There are too many factors changing at once.
Look at 4.1.0+391 on Manjaro vs 4.1.0+387 on Kubuntu, and compare the rows using the same NVidia version without OpenCL (so, fix the NVidia version, and for each fixed version, compare the relative performances of Linux + darktable versions):

NVidia 470: 4.392 vs 4.102, or 1.07:1 (caused by changing distro + dt version)
NVidia 515: 4.309 vs 4.095, or 1.05:1 (caused by changing distro + dt version)

Now, if you fix the distro and darktable version, and vary the NVidia driver version, you get:
Kubuntu + 4.1.0+387: 4.102 vs 4.075 (slowest/fastest runs): 1.007:1 (caused by NVidia version change)
Manjaro + 4.1.0+391: 4.392 vs 4.309: 1.019:1 (caused by NVidia version change)

To me, that looks like the Linux + dt version means much more difference (5-7%) than the NVidia version (0.7 - 1.9%). Actually, a measurement difference in the range of 1% could easily be ‘noise’ (caused by other software running during the measurement, like cron jobs and the like).

Plus, if you really want to benchmark, export a given set of pictures to produce longer runs and reduce measurement noise. Be sure to clear the darktable mipmap cache before each batch export, and use the same configuration, of course. And vary one parameter at a time.

This issue may explain some of the performance difference. There is a problem with recent versions of the GCC compiler that we work around by disabling an optimization. That results in a performance drop in some places (capture sharpening, for example). The Windows build uses an up-to-date version of GCC (currently 12.2.0) to compile RawTherapee and is affected. The AppImage build process runs on Ubuntu 18.04 which has the much older GCC 7.5.0 and is not affected.

2 Likes

Apart from the issue Lawrence linked to, this also might be reason why the file browser can be slower. Processing and adjusting sliders shouldn’t be affected by this.

I will compile on Arch in a minute. Arch shoul use the newest compiler.

Just built RT on Garuda. Slower than the Appimage but not nearly as slow as Windows. Jumping to the next raw takes approximately 1.85 sec.
It’s clear that it’s capture sharpening that slows down the process (the c.sh. bar is slower in the bottom left corner).

Going to compile on Windows now.

RT compiled on Windows is as fast as compiled on Arch/Garuda

1 Like

Good job with the investigation, Anna!

1 Like

I see the devs have already done something:
Set -ffp-contract=off compiler flag for GCC >= 11 (#6384) by Lawrence37 · Pull Request #6583 · Beep6581/RawTherapee · GitHub - I guess that’s what I have tested now? So if I had tested it yesterday it would have been slower…

edit: apparently it’s not merged yet

@Lawrence37 's fork seems to be a little faster than dev on Windows, maybe even faster than the Appimage, jumping to the next raw takes 1.5-1.6 secs, maybe sometimes 1.4
Going to compile on Arch…

by the way, there is a mistake in the compiling instructions for Windows, it should be git clone https:// not git clone git://

just compiled @Lawrence37 's version, seems to be as fast as the Appimage