How to use both CPU and GPU for darktable output?

So you are saying that a 4060 gets you a lot of bang for the buck. Agreed. :sunglasses:

Also your numbers match my own calculations, with a little variance induced by the combination of tools in darktable. The ā€œGPU-benchmark * 2.5 equals CPU-benchmarkā€ should work as a solid estimate.

Previously I had measured greater differences. I think youā€™ll find them on the forum. Currently, my darktable build is optimised for my CPU; if you use pre-built packages, those wonā€™t be. The GPU code is always compiled for the specific card.

Iā€™ve posted about this in the past, so I will repeat it.

What matters the most is the Vram. A card with 2GB Vram means an effective of around 1.6VRAM for darktable (the OS uses some of it). You then ask dt to process a 34Mpix image and it cannot loaded all into memory. This forces dt to split the image into tiles and process each tile in sequence and then merge the tiles. While the GPU card might be fast enough, the limited memory is forcing it to do extra work/time. A more modern GPU will have more parallel processing units, but if it is limited in memory, it is not going to save that much time.

For 35Mpix images I would suggest at least 4Vram with 6Vram being ideal.

There are tools within darktable that will help you understand what is happening, -d perf and -d tile.

4 Likes

This exacerbates the phenomenon I mentioned earlier: You pay a significant penalty for shuffling data from CPU to GPU and back.

Some of that penalty is just the fundamental PCIe bandwidth bottleneck, but in addition to that, thereā€™s a fixed per-transfer latency (to set up the transfer and execute it) that will be especially noticeable in this scenario.

Also, itā€™s desirable to have the entire pipeline stay in the GPU without downloading back to CPU, this also canā€™t happen if the GPU doesnā€™t have enough VRAM.

Unfortunately, for this particular application, the GT1030 is a particularly bad choice. It has many of the negatives of an integrated GPU (low number-crunching performance) while also having the biggest drawback of a discrete GPU (PCIe transfer penalties). A GT1060 or better manages to have enough capability to compensate for that drawback.

2 Likes

ā€¦ following the discussion here with interest I have a question:
How much GPU VRAM would be required or recommendable for editing/exporting 60 Megapixel RAW images? How to calculate that correctly?

Iā€™m not sure about the requirements for exporting but for editing, remember that darktable doesnā€™t send the whole image through the pipeline in the darkroom. At the most youā€™re only processing the number of pixels you have in your display, so for me the worst case is that the darkroom edit pipe runs on a 1440p image.

Itā€™s export that uses all the VRAM, and personally Iā€™m not bothered about export speed, since it will always take much longer to do the edits.

1 Like

Thanks @elstoc , that is something I discovered, too.
And yes, I can really feel the speedup in the GUI even with the low-end GPU in my test-setup.

I also have (potential) use-cases where the edits are fixed settings (timelapse) and all the power would be required for the export. Looking at a multi-GPU setup for those. The comments in this topic have been super helpful to get a grip on what is possible and what is required to make those use-cases shine. Still a long way off and highly dependant on customer financing, so nothing I really worry about right now.

The question by @Roland_Rainer on how to calculate/predict required vRAM is something I am highly interested in that regard.

Technically required Vram is zero since dt can just use the CPU path, but slower.

tl; dr 4Gb VRAM or more. Above 8Gb is not needed.

For recommendations, the biggest bang for the buck is to limit the tiling with zero tiling being ideal. I have a Nvidia 3060 with 12gb of Vram. I get zero tiling even with some very large images.

I couple of months ago someone asked a similar question and I tested using my system and limiting the memory. It was a Fuji large image with 2 D&S modules (my typical edits). I think it was via a Matrix conversation, so I dont have the results saved. At 6gb, there was no tiling and with 4gb there was some tiling, but the difference in export was negligible. At 2Gb you start to notice the impact.

You can find a 3050 8Gb for $200USD and a 3060 12Gb for $250USD. Both would be great for dt and future proof in driver support.

2 Likes