Hello, I have a server Runninng 4 Ten core Xeons and am using a GUI to run through command line for Raw Therapee via the batch image command seen here:
My Desktop (3770k) is currently getting around 40 frames (images) per minute whereas my Xeon based server is getting 1 Frame (image) per minute. I want to know if there is any reason as to why it is going so slow. I’d also like to know if there is any way to fix this issue.
Where is drive T:? Maybe the destination is the bottleneck.
I’m not sure you used -a correctly - according to RawPedia a file or folder should follow, according to rawtherapee-cli --help it should not - one of them needs to be fixed by us.
Using PNG will slow you down. Use uncompressed TIFF if you want speed.
rt is optimized for multiple cores, not for multiple cpus. It mostly uses dynamic scheduling which I expect to be the culprit for the slowness on your system.
Could you try the following commands before you start rt?
On Linux or on Windows when running rt in msys2 console
set OMP_NUM_THREADS=10
set GOMP_CPU_AFFINITY='0-9'
That binds rt to the 10 cores of your first cpu.
If rt runs faster with this settings we know that my assumption is true and we can think about how to allow static scheduling for systems with multiple cpus.
So those commands are done from a rawthreepee command window, and considering I use the -w tag (which forcefully keeps the window closed), I don’t see how exactly to implement this. I don’t see how exactly to implement this. Also, this prefix before my batch command does not work, “/start affinity 1” Windows says it can not find -w, is that intended
Considering you’re running on big iron, you should copy the files under test to a RAM disc, so we can rule out an I/O problem and come closer to what Ingo is suspecting and wants to get tested.
We need more input. You wrote that you are using 4 ten-core xeons. Maybe there is only one xeon model with ten cores but we don’t have the time to search for this kind of information. You didn’t tell us which kind of raw files you processed in your tests. You didn’ tell us which processing you applied to the files.
If you need speed, you need to build RT yourself. The packaged build is restricted to SSE2 usage. It’s easy to build RT even on Windows, just look here
As I already wrote, RT isn’t optimized for multiple cpu systems. It’s optimized for multiple cores on one cpu machines. I’m really interested to optimize it also for multiple cpu machines, but I don’t have such a machine myself. That’s a point where you could help with tests.
Also, I am more than happy to help in making the software optimized for Multi-Processor Systems, Whether that be testing different application versions or configurations.