I have a powershell script to gather a series of run timings (externally!) using the non-GUI command line interface \sf \color{yellow}rawtherapee\text-cli\text.exe
What does it do?
It clones a repo of 3 small public-domain RAW files w/ pp3’s, processes each 5 times, takes the average processing times, totals them, and reports the timing for threads = 2, 4, 8, 16 etc depending on available cores. It takes 10-15 minutes to complete all the test runs for each build.
What data are we looking for?
The reports from this script having been run on various builds under test, for example the generic vs skylake-raptorlake buids. Since we will all be testing the same pp3s on the same RAWs on the same set of builds, the results should be comparable.
Run Powershell and cd into the RawTherapee program directory you would like to test. Pro Tip: type "cd " (cd then space bar) and drag the folder onto the powershell window, then press return.
Run this one-liner: (simply copy and paste into powershell and press return.)
Post the results of your tests in this gist: wintimer · GitHub in the following manner:
```
================================
Available threads = 8 / CPU = Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz / 3408 MHz / Target = Processor: skylake-raptorlake
62421 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
41778 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
37596 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================
```
```
================================
Available threads = 8 / CPU = Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz / 3408 MHz / Target = Processor: sandybridge-ivybridge
63748 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
42508 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
35636 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================
```
```
================================
Available threads = 8 / CPU = Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz / 3408 MHz / Target = Processor: generic x86
64380 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
43011 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
35738 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
================================
```
Grep is not recognized as an Unix command on my Pc:
Windows 11 - home edition;
Acer predator laptop.
To avoid this glitch I have changed a bit your script (as per a stackexchange suggestion).
In short changed grep (Unix) into findstr (windows native command)
Here is my results with the 3 proposed builds:
1° (very slow to get its results…) ================================ Available threads = 20 / CPU = 12th Gen Intel(R) Core™ i7-12700H / 2700 MHz / Target = Processor: generic x86 345685261 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2 234843645 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4 208762271 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8 242663195 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16 ================================
2° ================================ Available threads = 20 / CPU = 12th Gen Intel(R) Core™ i7-12700H / 2700 MHz / Target = Processor: sandybridge-ivybridge 4372808 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2 4374136 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4 3290539 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8 4192327 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16 ================================
3° ================================ Available threads = 20 / CPU = 12th Gen Intel(R) Core™ i7-12700H / 2700 MHz / Target = Processor: skylake-raptorlake 5261077 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2 4019001 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4 3799304 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8 4383807 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16 ================================
Available threads = 24 / CPU = 13th Gen Intel(R) Core™ i7-13700 / 2100 MHz / Target = Processor: generic x86
27082 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
18057 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
14663 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
14928 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16
Available threads = 24 / CPU = 13th Gen Intel(R) Core™ i7-13700 / 2100 MHz / Target = Processor: sandybridge-ivybridge
31369 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
18424 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
15234 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
15113 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16
Available threads = 24 / CPU = 13th Gen Intel(R) Core™ i7-13700 / 2100 MHz / Target = Processor: skylake-raptorlake
34491 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
27324 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
24788 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
24322 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16
@Silvio_Grosso@chaimav
Thank you so much for testing, some interesting results so far. I’ve updated the one-liner command above, thankfully it was only missing the grep.
To post as code blocks you can have three backticks before and after each report like:
I found an error with my integer conversion of the floating point TotalMillisecons value. If you (@Silvio_Grosso ) get a chance you can rerun the updated script… and we can tell what the correct timings are on your system. …sorry about that
@chaimav yes- your results seem to have converted normally as did mine. Fingers crossed on that deal…. a lot of times the stats reported by cpus do not get documented unless a developer is able to launch a support ticket with Microsoft, and they decide to handle it.
I added some tests on Github.
They are all Windows machines (Windows 10 and 11) with very different hardwares.
All Intel CPUs except one (AMD).
EDIT.
It looks like milliseconds are extremely fickle to calculate (in terms of total digits).
Using the same method on different Windows machines I got differents amount of them (see github…)
Thanks for running those @Silvio_Grosso – some strange numbers still so working backwords…
Can you post the output of this powershell command on the i7-12700H? Measure-Command { echo hi }
That way we can see what data is being parsed.
This is what mine outputs:
No need to rerun on the HP Z2 SFF G9 Workstation Desktop PC or the Micro-Star International Co., Ltd., as it seemed to have parsed milliseconds ok.
Sure. Thanks a lot. I will keep you posted as soon as I am done.
This time, with your last script, you nailed it
E.g. with this PC which previously had its timing wrong:
System Information
Operating System: Windows 10 Pro 64-bit (10.0, Build 18362) (18362.19h1_release.190318-1202)
System Model: HP ProDesk 600 G4 SFF
Processor: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz (6 CPUs), ~3.0GHz
Memory: 16384MB RAM
Display Devices
Card name: Intel(R) UHD Graphics 630
RESULTS:
================================
Available threads = 6 / CPU = Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz / 3000 MHz / Target = Processor: generic x86
49716 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
35320 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
================================
================================
Available threads = 6 / CPU = Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz / 3000 MHz / Target = Processor: sandybridge-ivybridge
49459 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
34910 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
================================
================================
Available threads = 6 / CPU = Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz / 3000 MHz / Target = Processor: skylake-raptorlake
49750 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
34505 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
================================
As of today, on github, it looks like the fastest pc is the one by chaimav with 14663 total milliseconds with Processor: generic x86
Available threads = 24 / CPU = 13th Gen Intel(R) Core(TM) i7-13700 / 2100 MHz / Target = Processor: generic x86
14663 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
IMHO, this particular result means, for the gui, the optimum performance>threads setting should be closest to the target wavelet levels.
Looking at the denoise and wavelet code, there are almost no extentions beyond x86_64, yielding no higher efficiency code with the higher microarchitectures.
@chaimav et al.,I would suggest try setting Performance>Threads to whatever wavelet level you are using (default is for 7) and run RawTherapee with verbose mode on .\rawtherapee.exe -w and Verbose=true in options.txt and observe the console output for read outs. Compare with threads set to your max available, vs 0, etc.
PS. I’m also noticing similar diminishing returns on @Silvio_Grosso 's 20-thread Acer Predator:
================================
Available threads = 20 / CPU = 12th Gen Intel(R) Core(TM) i7-12700H / 2700 MHz / Target = Processor: generic x86
48426 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 2
28899 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 4
23573 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 8
25115 total milliseconds elapsed (average of 5 runs) using OMP_NUM_THREADS = 16
================================