darktable performance w.r.t GPU generations & models

Hello,

I’ve been wondering how much GPU generations & specs do impact darktable performances.

I have a powerful enough desktop Nvidia GFX 1070 that I do not intend to change soon, so my question is not a guidance request.
What I’d like to understand is, even on a theoretical PoV (e.g. given how recent dt module have been coded):

  • how much each GPU generation is improving dt’s perf. e.g. what are the expected performance improvements between 1070, 2070 and 3070. Or 1060/2060/3060. And the same for Mobile discreet GPUs.
  • how does CUDA cores count impact dt perfs
  • how much the NVRAM of the graphic card impacts dt perfs.

Puget’s numerous benchmarks runs provides a nice insight on performances for lightroom/photoshop/premiere/davinci_resolve with a lot of different setups … but we have no such thing for dt :wink:
Maybe people close the OpenCL parts of darktable could give an even generic insight.

it would help understand: at what point does a mobile GPU yields the same level performance as (prior) desktop GPU by looking at gpu_generation/cores_count/NVRAM… (yeah, thinking about a far future when I switch my desktop for a laptop)

Thanks to anyone who could provide a informed insight ! :slight_smile:
GLLM

1m

Use the forum search utility for

opencl clocking

and quite a few clockings/comparisons will appear.

This might give you some darktable benchmarks.

https://math.dartmouth.edu/~sarunas/darktable_bench.html

The current master (part of the 4.0 release), has significant improvements in the use of OpenCL and processing (tiles, etc).

If you want to test, you can use -d perf and -d memory and process the same image in your desktop vs laptop. The modules you use do matter and mainly during export.

Darktable uses the OpenCL to do mathematical calculations. It needs enough memory to load the raw file, results, and iterations. Smaller raw megapixels will need less memory than large ones. After some point, the extra memory is not used. The real benefit is cards with more computational units(CU) or even multiple GPU cards. Going from a 10 series to 30 series brings a lot more CU.

Oh, I search for benchmarks, GPU, but didnt try “Clocking”. Will do, thanks.

Edit: I just did : most posts are 2 years old :-/

Thanks @g-man
I do not have a laptop, giving it some wild & distant thoughts, that’s all.
I know about 4.0 openCL planned improvements, it’s part of the reason why I ask.

Thanks for the hint about the CU core count :slight_smile:

  1. Nowadays the on-cpu cl drivers work much better than some years ago, it’s probably worth to use them for performance. If you want to have such a device (let’s say an intel) you should note that a lot of ram (being shared) will help, so go for at least 16GB. Just a reminder, if you develop
    and export images that will put a lot of stress on your system for minutes, so look out for good thermal management. (desktops might get noisy, notebooks often do so too but then throttle down pretty soon.)
  2. If you look for a dedicated graphics card, yes - clock rate and number of processing cores do matter. But bus speed of the card matters even more (so look out for 128 vs 192 or 256bit bus and type of graphics ram)
  3. If you do a lot of exports and use the modern modules like diffuse&sharpen and develop images with more than 30Mpixels you will want a large graphics memory like 8GB. (Technically this is because of tiling and large overlaps)
4 Likes

Is the bus speed because of all the buffer copies that darktable currently does? Like Aurélien talked about here. 3/4 of the time going back and forth with data does seem like a lot.

Is there room for that to be optimized or is it sort of baked into the processing workflow due to the pipeline scheme…

It’s only partly because of buffer copies … some modules are very “ram-intensive”. And it’s about buffer copies because of tiling …

2 Likes

I see that you have been doing a lot of work on that particular area of DT…thanks for all your hard work…From what you say any improvements in tiling should be nicely reflected in DT performance

2 Likes

Exactly! :slight_smile:

2 Likes

Thanks @hannoschwalm for your reply.
Will check bus bandwidth & cores count.