Having read the man pages on open-cl performance optimisation, by changing configuration parameters, I’m left wondering what hardware factors can affect darktable performance – with the emphasis on the image editing rather exporting. This topic came into my mind the other day when helping my great grandson configure a top-end gaming machine which he plans to build.
Aside from the question of cooling, the emphasis was on the GPU and it’s capability with Ray Tracing and DLSS – which are, I believe, quiet irrelevant to darktable. I also convinced my self – rightly or wrongly – that most of the other functionalities that make for a good gaming GPU, which collectively make up its 3D performance, are also of little relevance to darktable.
Over the last 20 years the performance gains in GPU 3D performance (as measured for example by a GPU’s 3D PassMark) have been enormous - by a factor of about 25,000 for instance from that wonderful Rage Pro 128 of 1999 (remember that?) to this year’s nVidia RX 3080. But in the same period the G2D PassMark has seen much lower performance increase: by a factor of about 7 from the Rage card to the 3080 card. Or, much more tellingly, by only about 50% from the Geforce GT 745 of 2013 to the RX 3080 of 2021.
This leads me to conclude that I don’t need to have too much concern about how recent my graphics card is in terms of its effect on image processing. Am I right in my assumption? Is there a generally agreed minimum level of card one should use? How much does GGDR memory on the card affect darktable performance?
memory is crucial - if there’s not sufficient memory there’ll be additional overhead for tiling. With current megapixel counts the images needs a lot of memory (you can see a significant difference if working on a full HD screen vs a 4k Screen - and thats just the reduced data needed for displaying)
GPU processing power increase was enormous, GPU memory not
I assume you are meaning memory on the graphics card? How much is the minimum? Is too much never enough? And how does the system RAM amount affect the tiling performance? Is it all done on the graphics card?
Not so fast there, mister. All of the modern OpenCL image processing stages (e.g. Local Laplacian Pyramids, bilateral filter, blur, sharpen, scaling, even matrix colorspace transforms) actually run on those same general GPU cores.
While you might not care about vertex processing throughput for dt, you very much care about pixel processing throughput.
You need enough main memory to hold your image and associated data. That might also imply that having more GPU memory than you have main memory might not be too useful (if dt is the only program using the GPU memory!).
Tiling is done on the main CPU (from main memory, where the full image is stored). It is used to “cut the image in pieces” that the graphics card can handle. The bottleneck there is the transfer from and to main memory, needed to swap tiles (once in each direction per tile).
Of course, you could profile your system to see how much your graphics card speeds up the processing. Have a look at the startup options for dt.
But in the end, what is “fast enough” depends also on what you do with your images. Someone who’s mostly using simple processing and relatively small exports (like me) won’t see much effect from a top GPU (I see hardly any delays in processing as it is, and I don’t mind a bit of waiting on export). If you are into complex processing and need the math-intensive modules, or use high-resolution screens, you’ll profit more from a high-end graphics card…
Does this mean that the processing of ‘guided Laplacians’ from the Highlight Reconstruction module is performed in the GPU? Entirely or partly?
Would the choice of a higher end nVidia GPU (sorry, I know even less about AMD GPUs) be an effective response to the very long processing times, when using a high number of iterations, that Aurelien warns about in his recent YouTube video on Highlight Reconstruction ?
I can confirm that diffuse or sharpen is much faster on my NVidia 1060 than on my Ryzen 5600. However, for some images, I still had to increase opencl_mandatory_timeout. By default, after 2 seconds of processing on the GPU, darktable would think there was an OpenCL problem, abort the GPU computation, and restart from scratch on the (much slower) CPU.
Thanks for such an immediate response. Default on my V 4.0.1 appears to be 200 on my Linux install, but 400 on the Windows install. Timeout occurs on the Linux install, but, thanks to your advice, I can fix it now…