RawTherapee Windows Performance Testing Needed (Powershell)

ATTN: @chaimav @Silvio_Grosso @marter @Lawrence37 & all windows users

\qquad\qquad\sf \Huge \color{blue}Performance\ \color{green}testing\ \color{red}needed

I have a powershell script to gather a series of run timings (externally!) using the non-GUI command line interface \sf \color{yellow}rawtherapee\text-cli\text.exe

What does it do?

It clones a repo of 3 small public-domain RAW files w/ pp3’s, processes each 5 times, takes the average processing times, totals them, and reports the timing for threads = 2, 4, 8, 16 etc depending on available cores. It takes 10-15 minutes to complete all the test runs for each build. :coffee: :coffee: :coffee:

What data are we looking for?

The reports from this script having been run on various builds under test, for example the generic vs skylake-raptorlake buids. Since we will all be testing the same pp3s on the same RAWs on the same set of builds, the results should be comparable.


Instructions:

  1. Install the latest git for windows if you don’t have it already: https://github.com/git-for-windows/git/releases/download/v2.45.2.windows.1/Git-2.45.2-64-bit.exe

  2. Download/unzip the 3 standalone builds to test:

Generic x86 (all 64-bit CPUs / Windows 7-8):

https://github.com/Benitoite/RawTherapee/releases/download/nightly-github-actions/RawTherapee_genericwin_win64_release.zip

SandyBridge-IvyBridge (circa 2011-2015 / Windows 8-10):

https://github.com/Benitoite/RawTherapee/releases/download/nightly-github-actions/RawTherapee_midwin_win64_release.zip

SkyLake-RaptorLake (circa 2015-2022 / Windows 10-11):

https://github.com/Benitoite/RawTherapee/releases/download/nightly-github-actions/RawTherapee_fastwin_win64_release.zip

  1. Run Powershell and cd into the RawTherapee program directory you would like to test. Pro Tip: type "cd " (cd then space bar) and drag the folder onto the powershell window, then press return.

  2. Run this one-liner: (simply copy and paste into powershell and press return.)

git clone https://github.com/Benitoite/raw-test .\raw-test; $processor = Get-ComputerInfo -Property CsProcessors; $sockets = (Get-CimInstance Win32_Processor).SocketDesignation.Count ; $num = ($processor.CsProcessors | findstr NumberOfLogicalProcessors).Split(' ')[2]; $num *= $sockets ; $name = ($processor.CsProcessors | findstr Name).Split(':')[-1]; $mhz = ($processor.CsProcessors | findstr Max).Split(':')[-1]; $proc = (cat .\AboutThisBuild.txt | findstr Processor); echo "``````" "================================"; echo "Available threads = $num / CPU =$name / $mhz MHz / Target = $proc"; for ($threads = 2; $threads -le $num; $threads *= 2) { $env:OMP_NUM_THREADS=$threads; $t = 0; $n = 5; $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\typewriter.CR2 } | findstr Ticks).Split(': ')[-1] }; $t+=($x/$n); $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\naturalbridges.CR2 } | findstr Ticks).Split(': ')[-1] }; $t+=($x/$n); $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\beachcabin.ARW } | findstr Ticks).Split(': ')[-1] }; $t+=($x/$n); echo "$([math]::round([decimal]($t/10000),0)) total milliseconds elapsed (average of $n runs) using OMP_NUM_THREADS = $threads" }; echo "================================" "``````"
  1. Repeat Steps 1-2 for the next build.

  2. Post the results of your tests in this gist: wintimer · GitHub in the following manner:


```
================================
Available threads = 8  /  CPU = Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz  /  3408 MHz  /  Target = Processor: skylake-raptorlake
62421 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
41778 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
37596 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================
```
```
================================
Available threads = 8  /  CPU = Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz  /  3408 MHz  /  Target = Processor: sandybridge-ivybridge
63748 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
42508 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
35636 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================
```
```
================================
Available threads = 8  /  CPU = Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz  /  3408 MHz  /  Target = Processor: generic x86
64380 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
43011 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
35738 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================
```



pretty version of script one-liner:

git clone https://github.com/Benitoite/raw-test .\raw-test
$processor = Get-ComputerInfo -Property CsProcessors
$num = ($processor.CsProcessors | findstr NumberOfLogicalProcessors).Split(' ')[2]
$sockets = (Get-CimInstance Win32_Processor).SocketDesignation.Count
$num *= $sockets
$name = ($processor.CsProcessors | findstr Name).Split(':')[-1]
$mhz = ($processor.CsProcessors | findstr Max).Split(':')[-1]
$proc = (cat .\AboutThisBuild.txt | findstr Processor)

echo "``````" "================================"
echo "Available threads = $num  /  CPU =$name  / $mhz MHz  /  Target = $proc"

for ($threads = 2; $threads -le $num; $threads *= 2)
  {
    $env:OMP_NUM_THREADS=$threads;
    $t = 0
    $n = 5

    $x = 0; for ($i = 0; $i -lt $n; $i++) 
      { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\typewriter.CR2 } | findstr Ticks).Split(': ')[-1] }
    $t+=($x/$n)

    $x = 0; for ($i = 0; $i -lt $n; $i++)
      { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\naturalbridges.CR2 } | findstr Ticks).Split(': ')[-1] }
    $t+=($x/$n)

    $x = 0; for ($i = 0; $i -lt $n; $i++)
      { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\beachcabin.ARW } | findstr Ticks).Split(': ')[-1] }
    $t+=($x/$n)

    echo "$([math]::round([decimal]($t/10000),0)) total milliseconds elapsed (average of $n runs) using OMP_NUM_THREADS = $threads"
  }

echo "================================" "``````"

Thank you for testing RawTherapee! Feel free to ask questions or discuss the test.

Will join this test, but be back to my pc next week

Hello @HIRAM

Grep is not recognized as an Unix command on my Pc:

  • Windows 11 - home edition;
  • Acer predator laptop.

To avoid this glitch I have changed a bit your script (as per a stackexchange suggestion).
In short changed grep (Unix) into findstr (windows native command) :wink:

Here is my results with the 3 proposed builds:

1° (very slow to get its results…)
================================
Available threads = 20 / CPU = 12th Gen Intel(R) Core™ i7-12700H / 2700 MHz / Target = Processor: generic x86
345685261 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
234843645 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
208762271 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
242663195 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16
================================


================================
Available threads = 20 / CPU = 12th Gen Intel(R) Core™ i7-12700H / 2700 MHz / Target = Processor: sandybridge-ivybridge
4372808 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
4374136 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
3290539 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
4192327 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16
================================


================================
Available threads = 20 / CPU = 12th Gen Intel(R) Core™ i7-12700H / 2700 MHz / Target = Processor: skylake-raptorlake
5261077 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
4019001 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
3799304 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
4383807 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16
================================

Thanks. I was trying to figure out why it wouldn’t work. Here is the code with grep replaced by findstr

git clone https://github.com/Benitoite/raw-test .\raw-test; $processor = Get-ComputerInfo -Property CsProcessors; $num = ($processor.CsProcessors | findstr  NumberOfLogicalProcessors).Split(' ')[2]; $name = ($processor.CsProcessors | findstr  Name).Split(':')[-1]; $mhz = ($processor.CsProcessors | findstr  Max).Split(':')[-1]; $proc = (cat .\AboutThisBuild.txt | findstr  Processor); echo "================================"; echo "Available threads = $num  /  CPU =$name  / $mhz MHz  /  Target = $proc"; for ($threads = 2; $threads -le $num; $threads *= 2) { $env:OMP_NUM_THREADS=$threads; $t = 0; $n = 5; $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\typewriter.CR2 } | findstr  TotalMilliseconds).Split(' ')[2] }; $t+=($x/$n); $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\naturalbridges.CR2 } | findstr  TotalMilliseconds).Split(' ')[2] }; $t+=($x/$n); $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\beachcabin.ARW } | findstr  TotalMilliseconds).Split(' ')[2] }; $t+=($x/$n); echo "$([int]$t) total milliseconds elapsed (average of $n runs) using OMP_NUM_THREADS = $threads" }; echo "================================"

Available threads = 24 / CPU = 13th Gen Intel(R) Core™ i7-13700 / 2100 MHz / Target = Processor: generic x86
27082 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
18057 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
14663 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
14928 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16

Available threads = 24 / CPU = 13th Gen Intel(R) Core™ i7-13700 / 2100 MHz / Target = Processor: sandybridge-ivybridge
31369 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
18424 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
15234 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
15113 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16

Available threads = 24 / CPU = 13th Gen Intel(R) Core™ i7-13700 / 2100 MHz / Target = Processor: skylake-raptorlake
34491 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
27324 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
24788 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
24322 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16

It seems that the generic x86 is the quickest on my machine, with 8 and 16 threads coming in very close to each other for all versions.

P.S. When I tried pasting it, it became large and bold font, so deleted the ‘==========================’

@Silvio_Grosso @chaimav
Thank you so much for testing, some interesting results so far. I’ve updated the one-liner command above, thankfully it was only missing the grep.

To post as code blocks you can have three backticks before and after each report like:

```
=======
etc..
=======
```

I found an error with my integer conversion of the floating point TotalMillisecons value. If you (@Silvio_Grosso ) get a chance you can rerun the updated script… and we can tell what the correct timings are on your system. …sorry about that :slight_smile:

Are my results OK?

@chaimav yes- your results seem to have converted normally as did mine. Fingers crossed on that deal…. :eyes: a lot of times the stats reported by cpus do not get documented unless a developer is able to launch a support ticket with Microsoft, and they decide to handle it.

Hello @HIRAM

I added some tests on Github.
They are all Windows machines (Windows 10 and 11) with very different hardwares.
All Intel CPUs except one (AMD).

EDIT.
It looks like milliseconds are extremely fickle to calculate (in terms of total digits).
Using the same method on different Windows machines I got differents amount of them (see github…)

1 Like

Thanks for running those @Silvio_Grosso – some strange numbers still so working backwords…
Can you post the output of this powershell command on the i7-12700H?
Measure-Command { echo hi }
That way we can see what data is being parsed.
This is what mine outputs:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 10
Ticks             : 103011
TotalDays         : 1.19225694444444E-07
TotalHours        : 2.86141666666667E-06
TotalMinutes      : 0.000171685
TotalSeconds      : 0.0103011
TotalMilliseconds : 10.3011

Hello @HIRAM

Currently, I checked another Windows computer and I have timed it with my clock:

================
Available threads = 6 / CPU = Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz /  3000 MHz / Target = Processor: skylake-raptorlake
446743839 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
================

Here is its output with your command:

PS C:\Users> Measure-Command { echo hi }
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 5
Ticks : 52524
TotalDays : 6,07916666666667E-08
TotalHours : 1,459E-06
TotalMinutes : 8,754E-05
TotalSeconds : 0,0052524
TotalMilliseconds : 5,2524

With my clock this time (446743839) is around: 4 minutes and 24 seconds

As soon as I can get my hands on the i7-12700H, I will repeat the test :slight_smile:

If you are interested I can add this command, on github, for the different Windows computers tested yesterday…

I see the problem is my script can’t handle the comma thousands separator. I will switch it to use ticks instead.

@Silvio_Grosso Thanks for being willing to debug! If you can, rerun the updated script on those machines:

  • LENOVO - System Model: 11KC000SIX
  • HP Z4 G4 Workstation
  • ASUSTeK COMPUTER INC.
  • HP ProDesk 600 G4 SFF

git clone https://github.com/Benitoite/raw-test .\raw-test; $processor = Get-ComputerInfo -Property CsProcessors; $num = ($processor.CsProcessors | findstr NumberOfLogicalProcessors).Split(' ')[2]; $name = ($processor.CsProcessors | findstr Name).Split(':')[-1]; $mhz = ($processor.CsProcessors | findstr Max).Split(':')[-1]; $proc = (cat .\AboutThisBuild.txt | findstr Processor); echo "``````" "================================"; echo "Available threads = $num / CPU =$name / $mhz MHz / Target = $proc"; for ($threads = 2; $threads -le $num; $threads *= 2) { $env:OMP_NUM_THREADS=$threads; $t = 0; $n = 5; $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\typewriter.CR2 } | findstr Ticks).Split(': ')[-1] }; $t+=($x/$n); $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\naturalbridges.CR2 } | findstr Ticks).Split(': ')[-1] }; $t+=($x/$n); $x = 0; for ($i = 0; $i -lt $n; $i++) { $x+=(Measure-Command { .\rawtherapee-cli.exe -j -s -Y -c .\raw-test\beachcabin.ARW } | findstr Ticks).Split(': ')[-1] }; $t+=($x/$n); echo "$([math]::round([decimal]($t/10000),0)) total milliseconds elapsed (average of $n runs) using OMP_NUM_THREADS = $threads" }; echo "================================" "``````"

No need to rerun on the HP Z2 SFF G9 Workstation Desktop PC or the Micro-Star International Co., Ltd., as it seemed to have parsed milliseconds ok.

Hello @HIRAM

No need to rerun on the HP Z2 SFF G9 Workstation Desktop PC or the Micro-Star International Co., Ltd., as it seemed to have parsed milliseconds ok.

Sure. Thanks a lot. I will keep you posted as soon as I am done.

This time, with your last script, you nailed it :slight_smile:

E.g. with this PC which previously had its timing wrong:


System Information

     Operating System: Windows 10 Pro 64-bit (10.0, Build 18362) (18362.19h1_release.190318-1202)
         System Model: HP ProDesk 600 G4 SFF
                  Processor: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz (6 CPUs), ~3.0GHz
                       Memory: 16384MB RAM

Display Devices

       Card name: Intel(R) UHD Graphics 630

RESULTS:

================================
Available threads = 6 / CPU = Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz /  3000 MHz / Target = Processor: generic x86
49716 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
35320 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
================================
================================
Available threads = 6 / CPU = Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz /  3000 MHz / Target = Processor: sandybridge-ivybridge
49459 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
34910 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
================================
================================
Available threads = 6 / CPU = Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz /  3000 MHz / Target = Processor: skylake-raptorlake
49750 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
34505 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
================================
1 Like

Hello @HIRAM

Just added my tests on Github

As of today, on github, it looks like the fastest pc is the one by chaimav with 14663 total milliseconds with Processor: generic x86


Available threads = 24  /  CPU = 13th Gen Intel(R) Core(TM) i7-13700  /  2100 MHz  /  Target = Processor: generic x86
14663 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8

@Silvio_Grosso Thanks!

IMHO, this particular result means, for the gui, the optimum performance>threads setting should be closest to the target wavelet levels.

Looking at the denoise and wavelet code, there are almost no extentions beyond x86_64, yielding no higher efficiency code with the higher microarchitectures.

@chaimav et al.,I would suggest try setting Performance>Threads to whatever wavelet level you are using (default is for 7) and run RawTherapee with verbose mode on .\rawtherapee.exe -w and Verbose=true in options.txt and observe the console output for read outs. Compare with threads set to your max available, vs 0, etc.

PS. I’m also noticing similar diminishing returns on @Silvio_Grosso 's 20-thread Acer Predator:

================================
Available threads = 20 / CPU = 12th Gen Intel(R) Core(TM) i7-12700H /  2700 MHz / Target = Processor: generic x86
48426 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
28899 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
23573 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
25115 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16
================================

Should the two be linked, i.e. threads automatically set to wavelet level when automatic threads is used?

If there are data to support that, it might be worth a look.

1 Like