OpenCl multiple GPUs, memory and tuning

Begging for trouble :slight_smile:

Any ideas on what and how to investigate?

We are turning around. That’s why i suggested to use “very fast GPU sheduler” :slight_smile:

You will certainly run into problems doing so. CL code will abort due to stressing memory too much so falling back to cpu.

Don’t know how to help any further …

Is the OS set for powersaving ?? There are ways to ask for high performance…not sure how much it would help but you can specify that in the power and the gpu/display settings to be sure it is being used… this was also pointed out recently by a user of ON1 Photoraw software that this boosted performance dramatically…

I will drop your battery again but there is no free lunch when you want to go faster…

I posted the screenshots above… select DT and set it to run at high performance…

Power scheme could be edited as well… I think there is one you can enable called ultimate… this should let things run as fast as possible but will as mentioned cut the battery life…

I think there are also settings in the NVIDIA control panel… something called battery boost slows the card and I think it has a setting to prefer high performance as well… not sure how these interact with the OS settings…

@obe, isn’t this the latest driver for your card? The link came up when I put your card in the search criteria

@obe
You could also try:

opencl_scheduling_profile=default
opencl_device_priority=+0/+0/+0/+0/+0

Use the ID of the card to use in place of 0. This will also direct all processing to the specified card. The ‘priority’ setting is only used if the scheduling profile is set to default.

If your GPU is processing a module for a long time, and darktable has another pipeline to run, it will first wait (if the priority says using the GPU is mandatory), and then give up and fall back to the CPU anyway. For me, this usually meant lost time, because if darktable waited longer, the GPU could still have finished the task faster. To extend the timeout, you can set a large value here:

opencl_mandatory_timeout=20000

Details here: How cheap a GPU will still speed up darktable? - #62 by kofa.

If in doubt, just follow the guidance from Hanno. He knows what he’s talking about. For NVidia, in general: no tuning; and don’t try to use unlimited memory.

2 Likes

Thank you for your response.
I think you are right, but I downloaded the newest driver one week ago. At that time the newest version was version 546.17. That didn’t change performance. So I doubt that it will change much upgrading to version 546.29…

1 Like

I have disabled the onboard GPU in the bios. I was wondering if it there would be any benefit, only in regard to darktable, to use this setting in Win 10>settings>graphic settings?

Capture

Start darktable with - d perf and export an image with your typical processing. Record the time to export. Turn that off, restart and repeat. Compare export time for the same image.

@g-man, I just ran the test as you described. The results are within .001 secs across the board whether the hardware-acceleration is turned on or off. 50% of the results were identical. I can see no reason to turn this feature on.

1 Like

According to the darktable 4.6 user manual, “very fast GPU is the preferred setting for systems with a GPU that strongly outperforms the CPU”.

How does one go about determining this? For example my system uses the i7-8700 and RTX 2060.

Same steps with -d perf. Turn off opencl, change some settings in modules to force some reprocessing and then export. Set to default, do the same, set to very fast and do the same.

Then go to the log and compare the [preview] [full] and [export].

If you browse here and select DT then you can assign high performance to it… It may not have any impact but if the OS does throttle things down esp with laptops then this would not allow that and would give the max performance possible if all other settings in the nvidia config is optimum and within your software… At least this is how I understand this to work… this combined with the power profile set to ultimate will ensure the os doesn’t put the brakes on … if that never actually happens with DT then I guess it won’t make much difference… but setting it to the max means you know that it doesn’t…

That card should be fast enough that it will always be the better option so using the fast GPU setting to set it as the preferred choice in all cases should be the most performant…

I have a 3060Ti… I did a lot of optimization runs but this was about a year ago and the code might have changed a bit… One thing I did try was the micronap setting… by default it is 250. I tried it at 0. It didn’t introduce any crashes or funny business in my case and I got the biggest bump among all the opencl setting from making that change… You could try that for a boost. Its easy to reverse if you suspect any negative effects…

@priort The only opencl entries I see are

opencl=TRUE
opencl_checksum=3975488120
opencl_device_priority=/!0,///!0,* (I’m not sure if this is the right setting. I never changed it)
opencl_library=
opencl_mandatory_timeout=400
opencl_scheduling_profile=very fast GPU
opencl_tune_headroom=FALSE

no there are options you can set in your darktablerc file

[darktable 4.6 user manual - memory & performance tuning)table]

For example this is mine…

cldevice_v5_nvidiacudanvidiageforcertx3060ti=0 0 0 16 16 1024 1 0 0.000 0.000 0.250

@obe and as some background info why setting memory to unlimited or maximum with no headroom is “begging for trouble”

  1. The docs are not correct atm, I think we will have all correct in about a week or so.
  2. Let’s assume you have a 4gb card and let it use all memory. That will work in most cases if working in darkroom due to low memory requirements. Everyone is happy and thinks “whow, cool, got it”. Now export that image with high quality. The memory requirements “explode” as we render at full resolution and not downscaled as in darkroom. Now your device has to handle that data. But as the requirements are high it has to tile and does so by using the size of CL memory that is reserved for darktable or no tiling is required but for sure in your case it uses all. But your OS or Firefox are using graphics and memory too. Bang - darktable allocates graphics memory it won’t get and the code won’t work. So it has to abort the opencl code and do a fallback to cpu code. Here A) the aborting takes time and B) you use the slow Cpu!

So the lesson would be: Never try to use more memory than being safe. Some years ago we were safe with a safety margin of 400MB, this is not true any more being on windows or Linux as both the OS and applications use more graphics memory.

1 Like

That param has sections for each of darktable’s pipelines, separated by /. The pipelines, in order, are:

  • image (the central view of the darkroom),
  • preview (I assume on the lighttable, and maybe also the navigation view in the darkroom?)
  • export
  • thumbnail (on lighttable, and also on the filmstrip)
  • preview2 (a second preview can be shown on a 2nd display, for multi-monitor setups)

!0 means ‘any device but the one with ID = 0’
* means ‘any device at all’
+0 would mean ‘mandatorily on a GPU’; then you list the device IDs, in the example case only ID = 0. If all listed devices are busy processing something else, wait (block) until one becomes free, or until the opencl_mandatory_timeout elapses, then fall back to CPU. The timeout is measured in units of 5 ms (don’t ask me why), so 200 would mean 1 second.

While having all the parameters that you can tweak is great, I think it is not something that we should mess with. The settings selected are a balance of being safe and optimal for most systems. The mandatory timeout is the only one you should increase if you use a lot of iterations. When we only had the GL HLR, it was beneficial to let it go more than 2sec (mainly during export).

1 Like

I guess that we all want the most performance as possible when using darktable. The subject is complex and there seems to be differences in opinion as to the optimal settings. I think that one should know exactly what they are doing when they attempt to tune their graphics card for darktable. I for one do not.

I upgraded my graphics card from the GTX 1050 2MB to the RTX 2060 6MB when I noticed tiling and slow performance in general. The change was huge and I now find working in darktable to be an enjoyable experience.

I noted the darktablerc entries above in yesterdays post. If there is a specific setting that should be changed then I would do so. But, only upon the recommendation from someone in the know like @kofa , @priort or @hannoschwalm. Otherwise, I think it’s best that I leave the settings as they are rather than mess things up.