I am curious about the number of cores and threads RT uses efficiently. I know that PS and LR use not more than 4 cores. The use of a processor with 6 or more cores actually slows both these programs down when processing an image
I believe it is properly multi threaded, so all of them.
Most of the tools in RT make use of all available cores. Whether they do this efficiently also depends on the number of cores and the sorrounding hardware. For example my old 8-core FX8350 scales well when using the expensive CIECAM02 tool, but using simpler tools it’s limited by memory bandwidth (the memory in my old 8-core is simply not fast enough to feed all 8 cores with data for the simpler tools).
There also is at least one tool which doesn’t use all available cores (the non HDR Tonemapping, which uses 8 cores for some processing steps, but only one core for other processing steps).
The decoding (not to confuse with the demosaicing) of some raw formats (arw for newer sony cameras, Fuji compressed, floating point DNG from HdrMerge, and Phase One) also use all cores, but is limited by IO, means if the file is already in OS filesystem cache or on a really fast SSD, it will scale well.
The other decoders are mostly single threaded (Canon CR2 deoder uses 2 cores)
The most important demosaicers (amaze, igv, lmmse, vng4, rcd, dcb, xtrans, fast) also use all cores, while some older ones (eahd, …) still use only one core.
In file browser we tried to use not more than one core per thumb. Means, if you open a folder with only one image, only one core is used to process the thumb, but opening a folder with 1000 images, all cores will be used to process the thumbs.
My experience with ImageMagick on an 8-core laptop is that multithreading carries an overhead for CPU load, so increasing the number of threads also increases the CPU load, so doubling the threads won’t halve the elapsed time. And there will be a number of threads where, beyond that number, adding threads will also increase the elapsed time.
Depending on a large number of factors, this limit occur at about 3-5 threads. For maximum overall throughput, it may be better to limit each job to 1 or 2 threads, and run a few independent jobs at the same time.
But always watch out for memory usage. A few 35 M pixel images soon swamps my 12 GB memory. When this happens, IM will use disk instead which totally kills performance.
ImageMagick isn’t RawTherapee, of course.
In my experience, algorithms which use very expensive calculations (exp, log, sin, cos, atan2) scale very well, though not perfectly with the number of cores (threads) if they are coded correctly (avoid cache conflicts for example) and the problem is large enough.
Shooting 24 cores/threads on a small problem (like a single thumbnail processing) most likely will slow down because of synchronization overhead.
I only experienced this when the relation between size of problem and number of threads is out of balance. For large problems (e.g. processing a 35MPixel file) I experienced limits caused by memory bandwidth, but I didn’t experience slowdowns (in fact I experienced slowdowns, but they always showed me that I wrote bad code
Edit: I agree, that sometimes the processing can slow down if you use a too many threads. But coders can take care of that as well, as for example done here.