Which benchmarks provide an estimate to enable me compare and decide on which GPU to buy, for Image Processing ?

Some background 1st

I have spent some time learning Darktable, on Windows 10, and having become proficient at using it to process my raw images from my camera, the next frontier I’d like to tackle, is an upgrade of my GPU. At this time I’m using an Integrated GPU from one of the mobile Intel CPU’s. A good example of my current setup on a laptop is an Intel i5-4200u - CPU, which has an Intel® HD Graphics 4400, as its integrated GPU.

I wish to purchase a dedicated GPU, installed via the PCI-e connectors, for another Intel based workstation, which uses a Xeon processor.

I am not a gamer, and have zero interest in purchasing this GPU to play any games, the purpose is only to accelerate the performance of graphics editing software, such as darktable or Capture One.


Which publicly available benchmarks, and/or benchmark web site, will give me an indication, so I can compare, the relative estimated benefit/acceleration of various GPU’s, to my primary interest - Image Editing, in an image editor, such as darktable, Capture One, Adobe Lightroom etc.?

I ask this question because the few benchmarks I have taken a look at, seem to refer to the performance of GPU’s in accelerating games, but this does not provide me with any clue of how much benefit these GPU’s will add to Image Processing/Editing.

1 Like

Pharonix does an awesome job of benchmarking things, and dakrtable is part of their benchmarking suite, albeit he tests on Linux. Check it out: Graphics Cards Linux Reviews & Articles - Phoronix

Thanks for sharing I will check that out…in addition @OK1 if you have some choices you could share and if anyone has the same card you could maybe get some numbers from them running the DT pipeline and compare to what you get…only a rough estimate but might help to decide between one card over the other…from a recent thread it looks like optimizing OPENCL settings might really help to… @kofa can share but I think he discovered a couple of tweaks that greatly enhanced his results…

https://math.dartmouth.edu/~sarunas/darktable_bench.html

This is older but maybe you might only be able to use or find older cards…

Highly appreciated.

I will check this out. Albeit the benchmark was obtained on Linux, I am confident that it will still be relevant to Windows, cos what I need the most is the relative performance of one GPU against another, and in theory, I should expect a similar relative performance of these GPU’s against each other, on Windows.

Thanks.

Based on my further research via the benchmarks, and any other inputs, my goal is to find a GPU, with : -

  1. A good balance between power consumption (i.e. max watts no more than about 70 watts) - preferably one which takes all its power directly from the PCI-e interface, and does not need me to change my power supply.

  2. Cost to acquire - a used card - no more than £100 ($130),

  3. Obviously PCI-e, and ideally using only one PCI-e slot (with respect to total amount of PCI-e spacing taken up in the computer case)

  4. With GDDR5 RAM, rather than GDDR3 - between 2GB and 8GB.

  5. Actively cooled with fans.

  6. Good enough to accelerate Image processing, but does not need to be excellent for anything more demanding, such as games.

With no real knowledge of how these cards perform relative to one another, with respect to Image Processing, the GPU’s which I have considered with my limited knowledge, include : -

All from Nvidia - K2000, P600, T600, from their professional line.

I’d also be happy to consider anything from Nvidia’s enthusiast line of GPU’s (e.g. GT, GTX, RTX), which fits in with my aforementioned criteria/wish list.

I have not done any research into AMD GPU’s, but I’m open minded, if I can find enough justification, or advantage, to choosing a GPU from AMD.

Broadly I’m seeking something in a sweet spot, enough for occasional image processing, without spending too much money, and definitely not going overboard with features and capabilities that I do not need. My only GPU need, at this time, is image processing in darktable, and in the future maybe a bit of video editing. I definitely will NOT be doing any 3D, CAD, games, AI, or crypto mining!

Drivers might actually be better on Windows in some cases…even recently Win11 is for the first time faster than most Linux distro’s using the new 12th gen chips…there is some cpu thread core manager tweaking apparently needed for Linux and I am sure Linux will likely catch up and pass Windows again…but good driver support can go along way to a good experience with your card no matter the hardware spec…

It would be helpful to know the exact CPU model, because some of your requirements (low power, low price, thin but with fans) points towards a not very powerful GPU, and maybe what you’d gain from it is not worth the expense. On the other hand, a low power CPU (the i5-4200u on your laptop, for example) would benefit basically from any current GPU.

1 Like

Hi,

I’m running an AMD Ryzen 5 5600X CPU and an NVidia 1060/6GB card. The GPU’s power limiter is set to 60 W (down from the maximum value 120 W) – this does not seem to impact darktable export performance. For the CPU, I use the ondemand governor. I started up my machine almost 2 hours ago, but was away from it; CPU frequency stats indicate: 3.70 GHz:0.97%, 2.80 GHz:1.87%, 2.20 GHz:97.16%, so the governor is working (no surprise there).

What I found was that with darktable using the Release profile instead of RelWithDebInfo created a huge performance boost for diffuse or sharpen if the GPU was not used; but the GPU was still a lot faster; and even with 6 GB of GPU memory, setting a higher (in my case: 800 MB) OpenCL memory headroom was needed. See Pro Contrast Moose Peterson - #25 by kofa for details, but the summary is:

RelWithDebInfo, CPU path:

[dev_pixelpipe] took 106.707 secs (1127.128 CPU) processed `diffuse or sharpen' on CPU, blended on CPU [export]
[dev_pixelpipe] took 46.582 secs (446.771 CPU) processed `diffuse or sharpen 1' on CPU, blended on CPU [export]
[dev_pixelpipe] took 137.392 secs (1337.772 CPU) processed `diffuse or sharpen 2' on CPU, blended on CPU [export]

Release, CPU path:

[dev_pixelpipe] took 44.513 secs (495.844 CPU) processed `diffuse or sharpen' on CPU, blended on CPU [export]
[dev_pixelpipe] took 16.002 secs (180.493 CPU) processed `diffuse or sharpen 1' on CPU, blended on CPU [export]
[dev_pixelpipe] took 48.474 secs (551.746 CPU) processed `diffuse or sharpen 2' on CPU, blended on CPU [export]

OpenCL:

[dev_pixelpipe] took 23.684 secs (23.518 CPU) processed `diffuse or sharpen' on GPU with tiling, blended on CPU [export]
[dev_pixelpipe] took 7.059 secs (7.005 CPU) processed `diffuse or sharpen 1' on GPU with tiling, blended on CPU [export]
[dev_pixelpipe] took 18.559 secs (17.528 CPU) processed `diffuse or sharpen 2' on GPU with tiling, blended on CPU [export]

Great summary. I have been waiting for a year or more to upgrade hoping for some sanity in GPU pricing …My card is currently super weak…an old GTX 2GB card but it has me thinking that there may be some tweaking I could do to improve things a little even…How do you go about figuring out what to set that headroom parameter at?? Trial and error or is there some number tied to ram or GPU vram??

If you see “couldn’t enqueue kernel -4” errors in the logs and it falls back to using the CPU then the headroom must be increased. Otherwise there is no reason to change it.

1 Like

The CPU of the desktop for which I wish to consider upgrading/replacing the GPU, is a Xeon E3-1245v3 - a Haswell. At this time the discrete GPU on this computer is an NVIDIA K600.

What @paolod said. Plus, there’s nvdia-smi:

root@eagle:~# nvidia-smi
Thu Nov 25 22:07:39 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.00    Driver Version: 470.82.00    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0  On |                  N/A |
|  0%   37C    P8     7W / 120W |    413MiB /  6077MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1353      G   /usr/lib/xorg/Xorg                176MiB |
|    0   N/A  N/A      2456      G   /usr/bin/kwin_x11                  16MiB |
|    0   N/A  N/A      2521      G   /usr/bin/plasmashell               28MiB |
|    0   N/A  N/A      2653      G   telegram-desktop                    1MiB |
|    0   N/A  N/A      2862      G   ...AAAAAAAAA= --shared-files       20MiB |
|    0   N/A  N/A      9543      G   /usr/bin/krunner                    6MiB |
|    0   N/A  N/A     22164      G   /usr/lib/firefox/firefox          156MiB |
+-----------------------------------------------------------------------------+

To query only memory usage:

root@eagle:~# nvidia-smi -q -d MEMORY

==============NVSMI LOG==============

Timestamp                                 : Thu Nov 25 22:08:36 2021
Driver Version                            : 470.82.00
CUDA Version                              : 11.4

Attached GPUs                             : 1
GPU 00000000:09:00.0
    FB Memory Usage
        Total                             : 6077 MiB
        Used                              : 452 MiB
        Free                              : 5625 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 7 MiB
        Free                              : 249 MiB

Here is the output of the same commands, while darktable is processing a high-resulution image with diffuse or sharpen, which needs so much memory that the photo needs to be processed in pieces (using tiling):

root@eagle:~# nvidia-smi
Thu Nov 25 22:10:40 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.00    Driver Version: 470.82.00    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0  On |                  N/A |
|  0%   52C    P2    86W / 120W |   5762MiB /  6077MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1353      G   /usr/lib/xorg/Xorg                176MiB |
|    0   N/A  N/A      2456      G   /usr/bin/kwin_x11                  16MiB |
|    0   N/A  N/A      2521      G   /usr/bin/plasmashell               27MiB |
|    0   N/A  N/A      2653      G   telegram-desktop                    1MiB |
|    0   N/A  N/A      2862      G   ...AAAAAAAAA= --shared-files       18MiB |
|    0   N/A  N/A      9543      G   /usr/bin/krunner                    8MiB |
|    0   N/A  N/A     22164      G   /usr/lib/firefox/firefox          136MiB |
|    0   N/A  N/A     24973      C   ...able-master/bin/darktable     5367MiB |
+-----------------------------------------------------------------------------+

root@eagle:~# nvidia-smi -q -d MEMORY

==============NVSMI LOG==============

Timestamp                                 : Thu Nov 25 22:10:35 2021
Driver Version                            : 470.82.00
CUDA Version                              : 11.4

Attached GPUs                             : 1
GPU 00000000:09:00.0
    FB Memory Usage
        Total                             : 6077 MiB
        Used                              : 5762 MiB
        Free                              : 315 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 7 MiB
        Free                              : 249 MiB

The aim is to set the parameter so that you make use of almost all the memory on the card. Too high, and you don’t make full use of the RAM; too low, and you get the dreaded couldn’t enqueue kernel -4 error when darktable tries to allocate more than there is available. Here, I could lower the headroom a bit (I ‘waste’ about 300 MB of the 6 GB), but, depending on the number of applications I have running, the baseline will be different. If you decide to shut down everything else, and only have darktable running when processing images, you need not be that conservative, I think (losing 300 MB of your 2 GB might be a noticeable difference).

Also important to know (if you’re running on an old machine with little RAM – until this spring I had 4 GB only) is that you may get a could not create context for device 0: -6 error if you do not have enough computer memory free. That happened with my old machine; closing a few apps and re-launching darktable got OpenCL working again.

See OpenCL error codes (1.x and 2.x) - StreamHPC

1 Like

@priort if you have a motherboard with built-in graphics and you don’t need the GPU acceleration for anything else, you might have the option of running your monitor off your built-in graphics and just using your GPU for OpenCL/darktable. That’s how I have used my 2Gb GTX770 for the past couple of years and I could technically set the headroom to zero (though I still kept it at 100 just in case).

I’ve now decided that the GPU market isn’t going to get better any time soon though (if it’s not the bitcoin miners its chip shortages) and forked out for a second hand 6Gb GTX1060 and I’m very happy with it (though the price was the same as new prices were a year ago) – diffuse or sharpen is almost interactive.

This set me thinking, and I appreciate your thoughts. It’s so easy to get locked in into one mindset - such as “I need better performance from darktable = GPU upgrade”

Recap of my computer specs.

All my computers run Windows 10.

Up until about the last fortnight, most of my recent darktable edits were done on a laptop with - 8GB RAM - DDR3, Intel i5-4200u CPU, and a spinning hard drive @ 5400 rpm, with a 1366 x 768 -TN Panel display. This has a AMD Radeon mobile HD 8850 discrete GPU. ​

I had thought about upgrading the GPU on this other desktop with 24GB RAM - DDR3, Xeon E3-1245v3 CPU, and a bunch of spinning drives most @ 7200 rpm - no SSD, and a TN Panel monitor, which already had an NVIDIA K600
​discrete GPU

I thought through your comments again. I was reminded that so many components contribute to the overall performance of darktable and GPU performance which I had been fixated upon, was just one of these components.

I just happen to have retired the aforementioned laptop (pretty much, after 8 years of using it almost every day), and moved on, a few days ago, to a laptop with 16GB RAM - DDR4, Intel i5-7300u, but this new laptop has an M.2 SATA SSD, but no discrete GPU, and the display is an IPS Panel 1920 x 1080. I had not thought much about using darktable on this laptop, cos of my dissatisfaction with image editing performance on the older laptop.

Because of your comments, I just installed darktable on the “new” laptop, just to see if the more recent CPU/chipset/Faster disk-SSD/More RAM, etc, etc would have any impact.

Long story short. While I have not done any stop watch measured benchmarks, the “new” laptop is performing much better than my older computers, even though it does not have a discrete GPU. And I am, for now, perfectly happy with its speed. Regrettable or maybe fortunately, I will have to abandon this search for a upgrade to the discrete GPU on my desktop, cos I no longer need one, as the current laptop is apparently good enough for my edits. Things like the preview are pretty quick, and I rarely have to wait for anything.

For darktable, could a disk storage speed upgrade be just as significant, or even more significant than a GPU upgrade, and maybe far less expensive?

My immediate thoughts are that some of this increased performance is coming from the enhanced I/O of an SSD, instead of a spinning magnetic disk, especially as darktable also uses disk caching to store intermediate results of various points in its pipeline, so while upgrading or adding a GPU, could be one approach to improving darktable performance, maybe an SSD upgrade could just be what is needed, instead.

I wish I had the time and gear, to compare the performance enhancements from a GPU upgrade, with the performance enhancements from a storage speed upgrade.

Especially as GPU’s are really expensive at this time, for apps that work like darktable does, using disk caches for intermediate processing, in the pipeline, while a better GPU, will always augment the performance, unless one does a comparison, it may turn out that an inexpensive upgrade from a slower storage to something faster, may have enough of an impact on performance, but at a lower incremental cost, moreso - at this time, really fast SSD’s such as NVMe PCIe type are not as expensive as the higher end GPU’s, and unlike high end GPU’s which need an ample power supply, with SSD’s there is no need to upgrade power supplies.

I did not expect such an improvement, which I mainly attribute to the replacement of a spinning disk with an SSD.

Thanks again, for making me think thoroughly about this. The performance enhancement I needed is achieved, and I have no discrete GPU involved in this excellent result. A bit of a shock for me. ! Am really surprised and pleasantly so.

ADDENDUM on Editing improvements using an IPS display, instead of a TN

If I may add, with the new laptop, which has a much better display, better brightness, deeper blacks, and especially better contrast and resolution, I find that I no longer need to push the edits in darktable, cos it appears that the extreme edits I was having to do on my older computers with their inferior TN panel displays, caused me to over edit. With the new computer, a few minor tweaks and I’m satisfied with the image. In particular, the old displays with low contrast, had forced me to work more than was needed to pull out contrast - to compensate for their poor quality of display.

Its a huge further lesson learned - the impact that a good display can have on one’s workflow and results.

And to think that either on a laptop, or on a desktop, a decent, bright enough, and color accurate enough display, is at this time, relatively an affordable item, compared to many years ago. GPU’s are going up in price, while probably what we all need the most - a good quality screen, which would also lead to fewer edits or better edits, costs a lot less than an upgrade to one of these currently overpriced GPU’s.
This secondary benefit of a more modern more recent tech, was one that I had also not anticipated. We do not know what we do not know, and cannot experience what we have not experienced, until we do. The impact of a better display on photo editing was quite a surprise to me. I did not expect this much of an improvement, to the workflow.

Export Performance

Edit - I also tested the speed of exports, and for a 7th gen Intel mobile laptop, am really happy with the results.

I had not thought of that. I do have some integrated graphics not sure what it is…I have 32GB of Ram so not too bad but my cpu is an old AMD FX8350. Not bad in its day as a bang for the buck. It actually runs things not too badly. But I might benefit from trying out your setup…

1 Like

SSD’s make a big difference…

It is by far the most important component for darktable though. I have a fairly old PC and was considering upgrading my motherboard/CPU but since I bought my new GPU I no longer feel it’s necessary.

I did not expect such an improvement, which I mainly attribute to the replacement of a spinning disk with an SSD.

Ok that’s a pretty important one too. SSDs are massively faster.

Some numbers from P620 and rtx 2070:

Nvidia P620
151,788866 [dev_pixelpipe] took 0,197 secs (0,192 CPU) processed color calibration' on GPU, blended on GPU [export]* *151,810051 [dev_pixelpipe] took 0,021 secs (0,016 CPU) processed color calibration 1’ on GPU, blended on GPU [export]
image colorspace transform RGB–>Lab took 0,015 secs (0,007 GPU) [colorchecker ]
151,895724 [dev_pixelpipe] took 0,086 secs (0,069 CPU) processed color look up table' on GPU, blended on GPU [export]* *image colorspace transform Lab-->RGB took 0,015 secs (0,011 GPU) [colorbalancergb ]* *151,956271 [dev_pixelpipe] took 0,061 secs (0,040 CPU) processed color balance rgb’ on GPU, blended on GPU [export]
152,081962 [dev_pixelpipe] took 0,126 secs (0,110 CPU) processed color balance rgb 1' on GPU, blended on GPU [export]* *152,668654 [dev_pixelpipe] took 0,587 secs (0,576 CPU) processed filmic rgb’ on GPU with tiling, blended on CPU [export]
152,998778 [dev_pixelpipe] took 0,330 secs (0,497 CPU) processed `rgb curve’ on GPU, blended on GPU [export]

153,986291 [dev_process_export] pixel pipeline processing took 14,107 secs (143,729 CPU)

Nvidia rtx2070
56,219330 [dev_pixelpipe] took 0,031 secs (0,016 CPU) processed color calibration' on GPU, blended on GPU [export]* *56,250335 [dev_pixelpipe] took 0,031 secs (0,016 CPU) processed color calibration 1’ on GPU, blended on GPU [export]
56,318877 [dev_pixelpipe] took 0,069 secs (0,047 CPU) processed color look up table' on GPU, blended on GPU [export]* *56,386857 [dev_pixelpipe] took 0,068 secs (0,031 CPU) processed color balance rgb’ on GPU, blended on GPU [export]
56,484557 [dev_pixelpipe] took 0,098 secs (0,062 CPU) processed color balance rgb 1' on GPU, blended on GPU [export]* *56,523330 [dev_pixelpipe] took 0,039 secs (0,016 CPU) processed filmic rgb’ on GPU, blended on GPU [export]
56,688836 [dev_pixelpipe] took 0,165 secs (0,219 CPU) processed `rgb curve’ on GPU, blended on GPU [export]

57,139208 [dev_process_export] pixel pipeline processing took 3,636 secs (2,906 CPU)

1 Like