Processing speed, clockspeed or cores

Hi,

We need to process lots of raw files in RawTherapee, from different high-res camera models. (50 MP and up to 100 MP)
Now we’re looking for what cpu’s to buy, but the following question arises:

Would we benefit more from higher clockspeeds, or from more cpu cores at lowers speed?

I hope someone over here could get me some insights on this.

Thanks in advance!

I’m not sure, but I think @heckflosse can likely lend a hand at optimal considerations.

That depends on the cameras you use and also on the tools you use in RawTherapee.

The decoding of most raw files does not make full use of all cores. The decoding times also differ a lot. For example Canon CR2 is one which needs more time than Nikon NEF. To get short decoding time in RT, you need higher clockspeed.

After the decoding pass most of the processing pipeline makes full use of all cores. One exception is the Tonemapping tool, which makes not full use of all cores.

I can give you more details if you can tell us which tools of RT you use.

What about an I/O bottleneck with these large files? Anything special should be taken into account? SSD for sure? Especially running on a laptop.

RONC

It’s not especially problematic, hard drives can push 200 MB/s and a raw file might be only as much as 80 or so.

If I could build another system right now I’d go for a speed-vs-cores compromise and load up on RAM.

My Pixel Shift files can surpass 220mb from the Pentax K1. I think memory is for sure a bottleneck with heavy CPU use and large files.

RONC

I have an i7 6700K running Ubuntu and RT5.1. If you want to upload a large raw file, I’d be happy to process it and say how long it took and how much memory it used. This is providing it doesn’t get complicated! - I’m no linux expert - I’d just use the built in system monitor for the memory usage, for example. And you could say what you want doing or provide a PP3.

As a casual observer, my impression is that multi-core processing needs more software optimization. Out of the box, higher clock speeds would be noticeably faster on many accounts. However, modern CPUs are very complex. E.g., clock speeds and individual cores could be throttled depending on the situation and never reach the marketed upper limits and efficiencies. That is why there are benchmarks. Even then those are idealized tests.

Personally, I would to reserve some of the budget for more RAM than I think I would need. I often find that RAM is the limiting factor to my (low-end) system.

Without knowing the workflow it’s impossible to give good advice. We talk about really large amounts of raw files I guess :wink:

Thanks so far everyone,

Ingo is right about the amount of files: right now we process about 25.000 files per day (14hrs) on 30x E3-1275 v5. :wink:
We use Xeons because of the ECC support, which is a must for future processors.

The PP3 workflow is something like this:

  • Demosaic
  • (For some camera’s Flat / Dark field corrections, Lens corrections)
  • Exposure comp.
  • Color management (input profile)

Camera’s used: PhaseOne/Leaf/Canon/Nikon

1 Like

Lol, I was unaware of the scale. It is like baking a batch of cupcakes for your guests vs baking, packaging and distributing them to 1000s of big box grocery stores.

Yeah, wow. That scale. We’d love to see some bench marks!

1 Like

This thread prompted me to do a notional analysis I’ve pondered for a while. I took the same image and ran the same sharpen operation on three computers, with the available range of cores. rawproc measures the duration of its image operations and will log the occurrences, and it also has configuration parameters to set the number of cores used, by operation. So, for each of an AMD Phenom II/4core/3.2GHz, Intel i5/2core/2.6GHz, and Intel Atom/4core/1.6GHz I stepped through the number of cores used from 1 to available, doing 4-6 sharpens each number.

If there were no overhead, one would expect marginal speedup to progress linearly as the job were divided over an increasing number of cores. But there is overhead, in the form of cache traversal, thread setup, and probably other things, and I observed the marginal speedup decrease, from ~45% 1->2 cores to ~12% 3->4 cores. The 2-core i5 was interesting in that it has hyperthreading, with 4 threads, and small speedup was still observed on it after two cores, likely due to the hyperthreading hardware support.

Comparing the 1-core times for each of the machines, it is clear that processor speed scales at least linearly.

I’m going to build a Ryzen machine sometime soon, with one of the 8-core chips, and I’ll be interested to see if there’s a number of cores past which there’s no marginal benefit. So, for discussion, I’m going to throw out the half-assed assertion that getting the fastest processor is important, but there’s probably a number of cores past which it becomes not so beneficial to pursue.

There are other considerations I didn’t factor, including cache specifics, operating system scheduling, and image size. If there’s interest, I can post the data, but probably not until the weekend.