darktable standardized performance measuring?

I see lots of topics about darktable (or Rawtherapee) performance (or lack thereof).

Now wondering how good my hardware is, is there any standardized way to measure performance? IMHO it would be great to have one set of RAW files and do some specified tasks/routines and measure the time.

It would give a clearer idea what hardware or O/S works best. Or avoid to spend lots of money on hardware that has little effect on RAW processing.

Darktable is part of pharonix’s testing suite: Darktable Benchmark - OpenBenchmarking.org

There’s lots to say about the topic, too much for one post. What I will say is that the performance you need is also dependent upon the image sizes you will be mangling.

In code, most image processing is done in two nested loops, one iterating the rows and one nested within that iterates through the pixels in each row. To make this go faster, the programmer will use programming tools such as OpenMP to divide the rows among the available cores. This is a trivial thing to do; if you’ve already written the two loops, paralellizing them is a simple as putting a statement called a pragma on the line above the outer loop, and the compiler will do all the work to “parallel-ize” the code. In fact the compiler is smart enough to generate code that will automatically use all of the cores available on the computer. Cool Beans!!!

But the speedup isn’t a straight multiplication, because for an image in a given location in memory, some cores are closer to it than others. One core can usually address the memory directly, but the others have to “navigate” varying levels of memory cache. This overhead takes away some of the advantage of dividing the work among the cores.

On slower machines, this overhead can start to negate the advantage of more cores for smaller images, because the divided work is small relative to the overhead. This is just a notion, but I think more than four cores for images 24MP and smaller can diminish the speedup to the point where it’s not noticeable on slower machines.

So, the bottom line to all this is that the first priority in selecting hardware is to get a relatively fast processor. Right after that would come the number of cores, along with the caching structure, but I probably wouldn’t worry too many cores if the images are relatively small.

Now, all of the above depends on whether the programmer actually put the pragma in all the places in the code that could benefit from being parallelized. It used to be hard to program multithreading, but OpenMP makes it trivial. Using the GPU to do processing is still hard to do, so you’ll find such use implemented sporadically in programs, sometimes saved for the most tedious operations. What this means is that you need to understand your specific software’s use of the hardware; if you use mine, you’d be silly to buy a GPU, because I don’t do that sort of programming… :smiley:

If you read some of the hardware reviews, you’ll find they’ll typically run benchmark software that includes some sort of image processing, like the handbrake multithreaded video transcoder. This is probably as close to a relevant benchmark to our purposes as you’ll find.

It’s a complicated topic; hope this helps…


The benchmarks are sort of abstract numbers, I am more interested in what it means practically. This said, dt for me isn’t lightning fast, but really slow either. I guess most processes take like below 1 Sec (except opening dt or opening a 25Mb RAW).

dt doesnt support Fuji GFX 100 files yet, but I wonder how those 200Mb would run?

View Fujifilm GFX 100 sample gallery (non-final firmware) from DPReview.

I just managed to open this files in RawTherapee and can give you some numbers.
I measured amaze demosaic and raw ca correction (dt has the same algorithms) on my AMD FX8350

Full size raw ca correction: 0.6 seconds
Full size amaze demosaic: 1.25 seconds

DSCF0040.RAF does not open for me. Tried dt 2.6.2 and rt 5.6 and rt 5.6-68 (dev version).

Seems you have the Fuji codecs already on your PC, isn’t it? Or how you managed to open it?

Windows10 no codec installed
RawTherapee_5.6_WinVista_64 doesn’t open
RawTherapee_dev_5.6-67-g6486c491f_WinVista_64_190606 opens ok. No DCP available.

1 Like


I am using darktable for performance measuring since a log time. The idea is to use a standard raw file and xmp file, run darktable as cli command line version and take the “pixel pipeline processing” time out of the debug output.

I got the idea for my own scripts from here:

I am using the same raw+xmp as mentioned in that article. Can be found here:


This is the basic script I am using:

rm -f test*.jpg
for d in $(seq 1 3); do
	echo -n "run $d: "
 	darktable-cli bench.SRW test-$d.jpg --core --configdir /tmp --disable-opencl -d perf -d opencl | grep "processing took"

It needs to be executed in the directory where the raw+xmp files are located.

Output looks similar to this:

4# ./bench-script-sequence-3.sh
run 1: 15,532968 [dev_process_export] pixel pipeline processing took 14,968 secs (116,138 CPU)
run 2: 15,293704 [dev_process_export] pixel pipeline processing took 14,976 secs (116,147 CPU)
run 3: 15,299993 [dev_process_export] pixel pipeline processing took 14,982 secs (116,102 CPU)

I use the command line option --disable-opencl to measure just the CPU speed. With opencl enabled (just remove the option from the command line) the speed doubles in my case: i7-7700k with Nvidia GTX 1050 Ti.

I found that the CPU benchmark correlates very well with the clock speed of my RAM: 2400 → 3200 MHz.

I modified above script, so I will have four set of runs

  1. as above
  2. with opencl
    3 and 4) as above but with my own config-dir, means I ommited this part:
--configdir /tmp

As a consequence I get:

(darktable-cli:22551): Gtk-WARNING **: 06:04:21.587: gtk_disable_setlocale() must be called before gtk_init()

Indeed I can use diff-tool to investigate, where it may come from but the differences are quite a lot. Anybody a good hint, what I could look after?

1 Like