Processing speed, clockspeed or cores

Thijs_Leegwater · July 21, 2017, 2:52pm

Hi,

We need to process lots of raw files in RawTherapee, from different high-res camera models. (50 MP and up to 100 MP)
Now we’re looking for what cpu’s to buy, but the following question arises:

Would we benefit more from higher clockspeeds, or from more cpu cores at lowers speed?

I hope someone over here could get me some insights on this.

Thanks in advance!

patdavid · July 21, 2017, 3:26pm

I’m not sure, but I think @heckflosse can likely lend a hand at optimal considerations.

heckflosse · July 21, 2017, 3:41pm

That depends on the cameras you use and also on the tools you use in RawTherapee.

The decoding of most raw files does not make full use of all cores. The decoding times also differ a lot. For example Canon CR2 is one which needs more time than Nikon NEF. To get short decoding time in RT, you need higher clockspeed.

After the decoding pass most of the processing pipeline makes full use of all cores. One exception is the Tonemapping tool, which makes not full use of all cores.

I can give you more details if you can tell us which tools of RT you use.

rechmbrs · July 21, 2017, 5:31pm

What about an I/O bottleneck with these large files? Anything special should be taken into account? SSD for sure? Especially running on a laptop.

RONC

CarVac · July 21, 2017, 5:49pm

It’s not especially problematic, hard drives can push 200 MB/s and a raw file might be only as much as 80 or so.

HIRAM · July 21, 2017, 5:58pm

If I could build another system right now I’d go for a speed-vs-cores compromise and load up on RAM.

rechmbrs · July 21, 2017, 6:03pm

My Pixel Shift files can surpass 220mb from the Pentax K1. I think memory is for sure a bottleneck with heavy CPU use and large files.

RONC

RawConvert · July 21, 2017, 7:51pm

I have an i7 6700K running Ubuntu and RT5.1. If you want to upload a large raw file, I’d be happy to process it and say how long it took and how much memory it used. This is providing it doesn’t get complicated! - I’m no linux expert - I’d just use the built in system monitor for the memory usage, for example. And you could say what you want doing or provide a PP3.

afre · July 21, 2017, 7:55pm

As a casual observer, my impression is that multi-core processing needs more software optimization. Out of the box, higher clock speeds would be noticeably faster on many accounts. However, modern CPUs are very complex. E.g., clock speeds and individual cores could be throttled depending on the situation and never reach the marketed upper limits and efficiencies. That is why there are benchmarks. Even then those are idealized tests.

Personally, I would to reserve some of the budget for more RAM than I think I would need. I often find that RAM is the limiting factor to my (low-end) system.

heckflosse · July 21, 2017, 9:01pm

Without knowing the workflow it’s impossible to give good advice. We talk about really large amounts of raw files I guess

Thijs_Leegwater · July 24, 2017, 7:05am

Thanks so far everyone,

Ingo is right about the amount of files: right now we process about 25.000 files per day (14hrs) on 30x E3-1275 v5.
We use Xeons because of the ECC support, which is a must for future processors.

The PP3 workflow is something like this:

Demosaic
(For some camera’s Flat / Dark field corrections, Lens corrections)
Exposure comp.
Color management (input profile)

Camera’s used: PhaseOne/Leaf/Canon/Nikon

afre · July 24, 2017, 9:39pm

Lol, I was unaware of the scale. It is like baking a batch of cupcakes for your guests vs baking, packaging and distributing them to 1000s of big box grocery stores.

paperdigits · July 24, 2017, 10:05pm

Yeah, wow. That scale. We’d love to see some bench marks!

ggbutcher · July 27, 2017, 3:19am

This thread prompted me to do a notional analysis I’ve pondered for a while. I took the same image and ran the same sharpen operation on three computers, with the available range of cores. rawproc measures the duration of its image operations and will log the occurrences, and it also has configuration parameters to set the number of cores used, by operation. So, for each of an AMD Phenom II/4core/3.2GHz, Intel i5/2core/2.6GHz, and Intel Atom/4core/1.6GHz I stepped through the number of cores used from 1 to available, doing 4-6 sharpens each number.

If there were no overhead, one would expect marginal speedup to progress linearly as the job were divided over an increasing number of cores. But there is overhead, in the form of cache traversal, thread setup, and probably other things, and I observed the marginal speedup decrease, from ~45% 1->2 cores to ~12% 3->4 cores. The 2-core i5 was interesting in that it has hyperthreading, with 4 threads, and small speedup was still observed on it after two cores, likely due to the hyperthreading hardware support.

Comparing the 1-core times for each of the machines, it is clear that processor speed scales at least linearly.

I’m going to build a Ryzen machine sometime soon, with one of the 8-core chips, and I’ll be interested to see if there’s a number of cores past which there’s no marginal benefit. So, for discussion, I’m going to throw out the half-assed assertion that getting the fastest processor is important, but there’s probably a number of cores past which it becomes not so beneficial to pursue.

There are other considerations I didn’t factor, including cache specifics, operating system scheduling, and image size. If there’s interest, I can post the data, but probably not until the weekend.