OpenCl multiple GPUs, memory and tuning

Thank you guys for all your input.
Based on this input I have deducted the following tuning darktable (some is specific to my pc):

  1. Make sure that as much as possible is processed on the NVIDIA GPU. That means turn on OpenCL and prevent the Intel GPU from being used
  2. There are enough resources to edit images like the demo image effectively (no tiling). The speed of the GPU is the limiting factor.
  3. Exporting is very hard on resources and runs for a very long time if processing on the NVIDIA GPU is not possible (due to lack of memory). Even tiling slows down the export a lot.
  4. The NVIDIA card is seldom used in everyday situations (only when processing video, Photoshop, NORTON security scanning and darktable and maybe few other special cases)

So:
The default OpenCl setting is not ideal in this situation since it explicitly prevents the NVIDIA GPU from processing the preview.
There are a number of ways to prevent the Intel GPU from being used. The INTEL GPU can be removed from the system or deactivated with possible consequences for battery life and other problems.
Use of the INTEL GPU can also be prevented with several different settings in darktable. Iā€™m not sure whether any one method is clearly superior to others. It seems that darktable all by itself selects the NVIDIA card if allowed.

Since the NVIDIA GPU is only used in special situations, it seems safe to try to use max. memory. I tried the following settings: OpenCL performance = ā€œmemory sizeā€ in the presets and forced headroom = 1 in the darktablerc

This caused the export process time to drop from 79 to 59 sec. with no tiling except for the ā€œdiffuseā€-module. Processing ā€œdiffuseā€ dropped from 64 to 45 sec to process.
Exporting really appreciates plenty of GPU-memory.

It just seems odd that your export takes so long. Before replacing with the RTX 2060 6MB, my old GTX1050 2MB card was able to export a 16MB NEF to 16 bit tiff is just under 5 sec.

I wonder if there is something else on your system thatā€™s bottle necking performance in darktable.

Begging for trouble :slight_smile:

Any ideas on what and how to investigate?

We are turning around. Thatā€™s why i suggested to use ā€œvery fast GPU shedulerā€ :slight_smile:

You will certainly run into problems doing so. CL code will abort due to stressing memory too much so falling back to cpu.

Donā€™t know how to help any further ā€¦

Is the OS set for powersaving ?? There are ways to ask for high performanceā€¦not sure how much it would help but you can specify that in the power and the gpu/display settings to be sure it is being usedā€¦ this was also pointed out recently by a user of ON1 Photoraw software that this boosted performance dramaticallyā€¦

I will drop your battery again but there is no free lunch when you want to go fasterā€¦

I posted the screenshots aboveā€¦ select DT and set it to run at high performanceā€¦

Power scheme could be edited as wellā€¦ I think there is one you can enable called ultimateā€¦ this should let things run as fast as possible but will as mentioned cut the battery lifeā€¦

I think there are also settings in the NVIDIA control panelā€¦ something called battery boost slows the card and I think it has a setting to prefer high performance as wellā€¦ not sure how these interact with the OS settingsā€¦

@obe, isnā€™t this the latest driver for your card? The link came up when I put your card in the search criteria

@obe
You could also try:

opencl_scheduling_profile=default
opencl_device_priority=+0/+0/+0/+0/+0

Use the ID of the card to use in place of 0. This will also direct all processing to the specified card. The ā€˜priorityā€™ setting is only used if the scheduling profile is set to default.

If your GPU is processing a module for a long time, and darktable has another pipeline to run, it will first wait (if the priority says using the GPU is mandatory), and then give up and fall back to the CPU anyway. For me, this usually meant lost time, because if darktable waited longer, the GPU could still have finished the task faster. To extend the timeout, you can set a large value here:

opencl_mandatory_timeout=20000

Details here: How cheap a GPU will still speed up darktable? - #62 by kofa.

If in doubt, just follow the guidance from Hanno. He knows what heā€™s talking about. For NVidia, in general: no tuning; and donā€™t try to use unlimited memory.

2 Likes

Thank you for your response.
I think you are right, but I downloaded the newest driver one week ago. At that time the newest version was version 546.17. That didnā€™t change performance. So I doubt that it will change much upgrading to version 546.29ā€¦

1 Like

I have disabled the onboard GPU in the bios. I was wondering if it there would be any benefit, only in regard to darktable, to use this setting in Win 10>settings>graphic settings?

Capture

Start darktable with - d perf and export an image with your typical processing. Record the time to export. Turn that off, restart and repeat. Compare export time for the same image.

@g-man, I just ran the test as you described. The results are within .001 secs across the board whether the hardware-acceleration is turned on or off. 50% of the results were identical. I can see no reason to turn this feature on.

1 Like

According to the darktable 4.6 user manual, ā€œvery fast GPU is the preferred setting for systems with a GPU that strongly outperforms the CPUā€.

How does one go about determining this? For example my system uses the i7-8700 and RTX 2060.

Same steps with -d perf. Turn off opencl, change some settings in modules to force some reprocessing and then export. Set to default, do the same, set to very fast and do the same.

Then go to the log and compare the [preview] [full] and [export].

If you browse here and select DT then you can assign high performance to itā€¦ It may not have any impact but if the OS does throttle things down esp with laptops then this would not allow that and would give the max performance possible if all other settings in the nvidia config is optimum and within your softwareā€¦ At least this is how I understand this to workā€¦ this combined with the power profile set to ultimate will ensure the os doesnā€™t put the brakes on ā€¦ if that never actually happens with DT then I guess it wonā€™t make much differenceā€¦ but setting it to the max means you know that it doesnā€™tā€¦

That card should be fast enough that it will always be the better option so using the fast GPU setting to set it as the preferred choice in all cases should be the most performantā€¦

I have a 3060Tiā€¦ I did a lot of optimization runs but this was about a year ago and the code might have changed a bitā€¦ One thing I did try was the micronap settingā€¦ by default it is 250. I tried it at 0. It didnā€™t introduce any crashes or funny business in my case and I got the biggest bump among all the opencl setting from making that changeā€¦ You could try that for a boost. Its easy to reverse if you suspect any negative effectsā€¦

@priort The only opencl entries I see are

opencl=TRUE
opencl_checksum=3975488120
opencl_device_priority=/!0,///!0,* (Iā€™m not sure if this is the right setting. I never changed it)
opencl_library=
opencl_mandatory_timeout=400
opencl_scheduling_profile=very fast GPU
opencl_tune_headroom=FALSE

no there are options you can set in your darktablerc file

[darktable 4.6 user manual - memory & performance tuning)table]

For example this is mineā€¦

cldevice_v5_nvidiacudanvidiageforcertx3060ti=0 0 0 16 16 1024 1 0 0.000 0.000 0.250

@obe and as some background info why setting memory to unlimited or maximum with no headroom is ā€œbegging for troubleā€

  1. The docs are not correct atm, I think we will have all correct in about a week or so.
  2. Letā€™s assume you have a 4gb card and let it use all memory. That will work in most cases if working in darkroom due to low memory requirements. Everyone is happy and thinks ā€œwhow, cool, got itā€. Now export that image with high quality. The memory requirements ā€œexplodeā€ as we render at full resolution and not downscaled as in darkroom. Now your device has to handle that data. But as the requirements are high it has to tile and does so by using the size of CL memory that is reserved for darktable or no tiling is required but for sure in your case it uses all. But your OS or Firefox are using graphics and memory too. Bang - darktable allocates graphics memory it wonā€™t get and the code wonā€™t work. So it has to abort the opencl code and do a fallback to cpu code. Here A) the aborting takes time and B) you use the slow Cpu!

So the lesson would be: Never try to use more memory than being safe. Some years ago we were safe with a safety margin of 400MB, this is not true any more being on windows or Linux as both the OS and applications use more graphics memory.

1 Like

That param has sections for each of darktableā€™s pipelines, separated by /. The pipelines, in order, are:

  • image (the central view of the darkroom),
  • preview (I assume on the lighttable, and maybe also the navigation view in the darkroom?)
  • export
  • thumbnail (on lighttable, and also on the filmstrip)
  • preview2 (a second preview can be shown on a 2nd display, for multi-monitor setups)

!0 means ā€˜any device but the one with ID = 0ā€™
* means ā€˜any device at allā€™
+0 would mean ā€˜mandatorily on a GPUā€™; then you list the device IDs, in the example case only ID = 0. If all listed devices are busy processing something else, wait (block) until one becomes free, or until the opencl_mandatory_timeout elapses, then fall back to CPU. The timeout is measured in units of 5 ms (donā€™t ask me why), so 200 would mean 1 second.