RawTherapee Windows Performance Testing Needed (Powershell)

Does the data collected here support it?

In this test we have excersized the -cli for some base figures. The question at hand is in regards to setting performance>threads in the gui, which we have only roughly simulated by passing thread counts to the -cli. Weā€™ve identified some next steps so far, namely, testing the gui performance with particular thread values. Following on to that would be verifying against linux/mac.

Hi @chaimav I have the K version at home, these stories about raptorlake (\sf 13^{th} gen) melting down have me a bit worried. How is your thermal performance on your rig?

I have only had it for about 2 weeks. At the time I purchased it, I was under the impression only K versions were susceptible (I was wrong). It does appear however, that the bios makes a big difference.

As for thermal performance, the highest I have seen it so far is 65C.

1 Like

My K is a definitely a hottie. Iā€™ve had it shut down during intense renders which I always assumed was due to protective measures and not due to frying transistors. The fan never lets up. Hopefully after the patch the thing is still ok.

\qquad \Huge šŸ”„

Hi!

Been a long time RawTherapee fan, and I love seeing mature open source continue to be developed and supported. Anyone whoā€™s been around open source for a while knows exactly what I mean haha.

My active workstation is a Haswell-EP (2697v3, 128GB 2133mhz in 4 channels, and the venerable old 1080ti), happy to pull down the latest beta and give the tests all haswell had to offer.

Also have a couple Sandy/Ivy workstations collecting dust, let me know if you need more data on the older architecture chips (All xeonā€™s, fwiw, 1650v1/v2, and i think thereā€™s a 2680v1 in a tray somewhere). I can certainly get some Ivy data from my ā€œbackupā€ workstation (2697v2), I would have to check and see if I have any Sandyā€™s mounted in other old cases, but if itā€™s important, I can probably find time to drop one in.

One last thing, I couldnā€™t tell if you wanted us to compile the beta, and if so, I typically optimize the code for the specific architecture iā€™m running, IntelOneAPI, etc. when applicable. Do you want the tests ran with architecture optimized code? or are you getting data from general release code to see where code may be causing older archs to drag?

Cheers,

RG

In this controlled test, the three test builds are prebuilt by GitHub actions from a dev pull into my fork. See the detailed step by step instructions in the top post.

At present we are examining the data to determine

  1. thread count comparison (we think there is a bug in the way the gui automatically determines optimum thread counts)
  2. microarchitecture performance comparison (we want to maximize performance and compatibility)

Thank you for testing.

2 Likes

AFAIK the patch is to add a hard cap on voltage, which (in theory) will only cap max-overclocking (or at least lowering the maximum ā€œstableā€ OC for a chip). Iā€™m the furthest from an intel fan-boy you can get, especially as their own XTU utility (their in house overclocking util) was at least partially to blame for causing irreparable damage to their own CPUā€™s. But if youā€™re not taking your processor out of OEM/base config, and arenā€™t running XTU (or running it specifically to nerf the ā€œAI overclockingā€ (turbo max boost)), the processor should perform as expectedā€¦ Whether ā€œas expectedā€ meant ā€œno meaningful difference in performance between the K and the standard i9-14900ā€ is definitely a different conversation, but since theyā€™re basically the same price now, maybe not even that big of a deal :slight_smile: The K/KF chips are a higher priority bin than the standard 14900, so in theory, less likelihood of defect related hot-spots, underperforming cores, etc. But yeah, i donā€™t recommend any of the i9-13xxx/14xxx processors to anyone who asks, but iā€™m basically a caveman using a 28 thread processor from idk, 2014 I think? E5-2697v3, might upgrade to a v4 sometime, but this old $75 xeon cuts through my 75mb raw files like a knife through butter. Havenā€™t had a reason to even look at upgrading. :slight_smile:

1 Like

Should have something for you early into next week.

Thanks for doing testing to this extent, itā€™s lovely to see!

Best Wishes,

RG

2 Likes

Perhaps late but just ran this on mine out of curiosity. Itā€™s ignoring hyperthreading, is that intentional?
Itā€™s a dual cpu Xeon 6136 with 128gb ram. 12 cores/cpu 24 threads/cpu ie. 48 total threads with hyperthreading.

Generic

================================
Available threads = 24 / CPU = Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz /  2993 MHz / Target = Processor: generic x86
61747 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
39395 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
29101 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
28844 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16
================================

Skylake

================================
Available threads = 24 / CPU = Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz /  2993 MHz / Target = Processor: skylake-raptorlake
63257 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
40282 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
29809 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
29270 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16
================================

wow, such differences on the same hardwareā€¦
here my results:

generic:
energy balanced
================================
Available threads = 8  /  CPU = Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz  /  3601 MHz  /  Target = Processor: generic x86
107569 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
88648 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
51803 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================

energy ultimate mode
================================
Available threads = 8  /  CPU = Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz  /  3601 MHz  /  Target = Processor: generic x86
46480 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
32038 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
26115 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================

fastwin:
energy balanced
================================
Available threads = 8  /  CPU = Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz  /  3601 MHz  /  Target = Processor: skylake-raptorlake
46411 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
31918 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
26548 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================

energy ultimate mode
================================
Available threads = 8  /  CPU = Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz  /  3601 MHz  /  Target = Processor: skylake-raptorlake
45084 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
31793 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
26110 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================

midwin:
energy balanced
================================
Available threads = 8  /  CPU = Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz  /  3601 MHz  /  Target = Processor: sandybridge-ivybridge
113285 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
87769 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
50910 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================

energy ultimate mode
================================
Available threads = 8  /  CPU = Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz  /  3601 MHz  /  Target = Processor: sandybridge-ivybridge
45325 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
31906 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
26352 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================

if desired, i can rerun the tests on different (newer) hardware.

1 Like

Hi @nosle , thank you for your testsā€¦ Not intentional, it just ignored the socket count. I just added a code to multiply threads per cpu times number of sockets. See if that gives you the 48 thread count. The script should test 2,4,8,16, and 32 on your device.

git clone https://github.com/Benitoite/raw-test .\raw-test; $processor = Get-ComputerInfo -Property CsProcessors; $sockets = (Get-CimInstance Win32_Processor).SocketDesignation.Count ; $num = ($processor.CsProcessors | findstr NumberOfLogicalProcessors).Split(' ')[2]; $num *= $sockets ; $name = ($processor.CsProcessors | findstr Name).Split(':')[-1]; $mhz = ($processor.CsProcessors | findstr Max).Split(':')[-1]; $proc = (cat .\AboutThisBuild.txt | findstr Processor); echo "``````" "================================"; echo "Available threads = $num / CPU =$name / $mhz MHz / Target = $proc"; for ($threads = 2; $threads -le $num; $threads *= 2) { $env:OMP_NUM_THREADS=$threads; $t = 0; $n = 5; $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\typewriter.CR2 } | findstr Ticks).Split(': ')[-1] }; $t+=($x/$n); $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\naturalbridges.CR2 } | findstr Ticks).Split(': ')[-1] }; $t+=($x/$n); $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\beachcabin.ARW } | findstr Ticks).Split(': ')[-1] }; $t+=($x/$n); echo "$([math]::round([decimal]($t/10000),0)) total milliseconds elapsed (average of $n runs) using OMP_NUM_THREADS = $threads" }; echo "================================" "``````"

whats wrong here?

git : Die Benennung ā€œgitā€ wurde nicht als Name eines Cmdlet, einer Funktion, einer Skriptdatei oder eines ausfĆ¼hrbaren
Programms erkannt. ƜberprĆ¼fen Sie die Schreibweise des Namens, oder ob der Pfad korrekt ist (sofern enthalten), und
wiederholen Sie den Vorgang.
In Zeile:1 Zeichen:1

Available threads = 12 / CPU = 13th Gen Intel(R) Coreā„¢ i7-1355U / 1700 MHz / Target = Processor: generic x86
6987517 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
3828642 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
3347085 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8

1 Like

This test requires a working installation of the git command. Hopefully that was it.

1 Like

sorry, but git (out of your link) was installed before test. I tried again, git was removed and installed again, but i got the identical result. Do laptops behave different?

1 Like

They should work the same. Be sure you are on powershell? Does the git command actually work on powershell there, for example:

1 Like

ok, i found the error, the path was not set while installing git. now the results look like this: is that reliable?

Available threads = 12 / CPU = 13th Gen Intel(R) Coreā„¢ i7-1355U / 1700 MHz / Target = Processor: generic x86
809 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
404 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
380 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8

Available threads = 12 / CPU = 13th Gen Intel(R) Coreā„¢ i7-1355U / 1700 MHz / Target = Processor: sandybridge-ivybridge
914 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
404 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
390 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8

Available threads = 12 / CPU = 13th Gen Intel(R) Coreā„¢ i7-1355U / 1700 MHz / Target = Processor: skylake-raptorlake
909 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
402 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
375 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8

I looks like itā€™s not processing anything. Possibly itā€™s just raising an error when calling the -cli. Check that you ā€˜cdā€™ into the correct directory.

Check that the clone operation succeeded.

Thanks a lot! Finally i got it: this subfolder was not created by the script on this machine. After creating it manually, everything worked fine. Here the results:

Available threads = 12 / CPU = 13th Gen Intel(R) Coreā„¢ i7-1355U / 1700 MHz / Target = Processor: generic x86
32423 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
25558 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
21732 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8

Available threads = 12 / CPU = 13th Gen Intel(R) Coreā„¢ i7-1355U / 1700 MHz / Target = Processor: sandybridge-ivybridge
33390 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
25824 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
22426 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8

Available threads = 12 / CPU = 13th Gen Intel(R) Coreā„¢ i7-1355U / 1700 MHz / Target = Processor: skylake-raptorlake
30006 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
25740 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
23402 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8

Thanks for working that out, results seem coherent. Iā€™m not sure why the clone didnā€™t create raw-test there by itself, unless there is some wierd permissions thing going on.