Thank you guys for all your input.
Based on this input I have deducted the following tuning darktable (some is specific to my pc):
Make sure that as much as possible is processed on the NVIDIA GPU. That means turn on OpenCL and prevent the Intel GPU from being used
There are enough resources to edit images like the demo image effectively (no tiling). The speed of the GPU is the limiting factor.
Exporting is very hard on resources and runs for a very long time if processing on the NVIDIA GPU is not possible (due to lack of memory). Even tiling slows down the export a lot.
The NVIDIA card is seldom used in everyday situations (only when processing video, Photoshop, NORTON security scanning and darktable and maybe few other special cases)
So:
The default OpenCl setting is not ideal in this situation since it explicitly prevents the NVIDIA GPU from processing the preview.
There are a number of ways to prevent the Intel GPU from being used. The INTEL GPU can be removed from the system or deactivated with possible consequences for battery life and other problems.
Use of the INTEL GPU can also be prevented with several different settings in darktable. Iām not sure whether any one method is clearly superior to others. It seems that darktable all by itself selects the NVIDIA card if allowed.
Since the NVIDIA GPU is only used in special situations, it seems safe to try to use max. memory. I tried the following settings: OpenCL performance = āmemory sizeā in the presets and forced headroom = 1 in the darktablerc
This caused the export process time to drop from 79 to 59 sec. with no tiling except for the ādiffuseā-module. Processing ādiffuseā dropped from 64 to 45 sec to process.
Exporting really appreciates plenty of GPU-memory.
It just seems odd that your export takes so long. Before replacing with the RTX 2060 6MB, my old GTX1050 2MB card was able to export a 16MB NEF to 16 bit tiff is just under 5 sec.
I wonder if there is something else on your system thatās bottle necking performance in darktable.
Is the OS set for powersaving ?? There are ways to ask for high performanceā¦not sure how much it would help but you can specify that in the power and the gpu/display settings to be sure it is being usedā¦ this was also pointed out recently by a user of ON1 Photoraw software that this boosted performance dramaticallyā¦
I will drop your battery again but there is no free lunch when you want to go fasterā¦
I posted the screenshots aboveā¦ select DT and set it to run at high performanceā¦
Power scheme could be edited as wellā¦ I think there is one you can enable called ultimateā¦ this should let things run as fast as possible but will as mentioned cut the battery lifeā¦
I think there are also settings in the NVIDIA control panelā¦ something called battery boost slows the card and I think it has a setting to prefer high performance as wellā¦ not sure how these interact with the OS settingsā¦
Use the ID of the card to use in place of 0. This will also direct all processing to the specified card. The āpriorityā setting is only used if the scheduling profile is set to default.
If your GPU is processing a module for a long time, and darktable has another pipeline to run, it will first wait (if the priority says using the GPU is mandatory), and then give up and fall back to the CPU anyway. For me, this usually meant lost time, because if darktable waited longer, the GPU could still have finished the task faster. To extend the timeout, you can set a large value here:
If in doubt, just follow the guidance from Hanno. He knows what heās talking about. For NVidia, in general: no tuning; and donāt try to use unlimited memory.
Thank you for your response.
I think you are right, but I downloaded the newest driver one week ago. At that time the newest version was version 546.17. That didnāt change performance. So I doubt that it will change much upgrading to version 546.29ā¦
I have disabled the onboard GPU in the bios. I was wondering if it there would be any benefit, only in regard to darktable, to use this setting in Win 10>settings>graphic settings?
Start darktable with - d perf and export an image with your typical processing. Record the time to export. Turn that off, restart and repeat. Compare export time for the same image.
@g-man, I just ran the test as you described. The results are within .001 secs across the board whether the hardware-acceleration is turned on or off. 50% of the results were identical. I can see no reason to turn this feature on.
Same steps with -d perf. Turn off opencl, change some settings in modules to force some reprocessing and then export. Set to default, do the same, set to very fast and do the same.
Then go to the log and compare the [preview] [full] and [export].
If you browse here and select DT then you can assign high performance to itā¦ It may not have any impact but if the OS does throttle things down esp with laptops then this would not allow that and would give the max performance possible if all other settings in the nvidia config is optimum and within your softwareā¦ At least this is how I understand this to workā¦ this combined with the power profile set to ultimate will ensure the os doesnāt put the brakes on ā¦ if that never actually happens with DT then I guess it wonāt make much differenceā¦ but setting it to the max means you know that it doesnātā¦
That card should be fast enough that it will always be the better option so using the fast GPU setting to set it as the preferred choice in all cases should be the most performantā¦
I have a 3060Tiā¦ I did a lot of optimization runs but this was about a year ago and the code might have changed a bitā¦ One thing I did try was the micronap settingā¦ by default it is 250. I tried it at 0. It didnāt introduce any crashes or funny business in my case and I got the biggest bump among all the opencl setting from making that changeā¦ You could try that for a boost. Its easy to reverse if you suspect any negative effectsā¦
opencl=TRUE
opencl_checksum=3975488120
opencl_device_priority=/!0,///!0,* (Iām not sure if this is the right setting. I never changed it)
opencl_library=
opencl_mandatory_timeout=400
opencl_scheduling_profile=very fast GPU
opencl_tune_headroom=FALSE
@obe and as some background info why setting memory to unlimited or maximum with no headroom is ābegging for troubleā
The docs are not correct atm, I think we will have all correct in about a week or so.
Letās assume you have a 4gb card and let it use all memory. That will work in most cases if working in darkroom due to low memory requirements. Everyone is happy and thinks āwhow, cool, got itā. Now export that image with high quality. The memory requirements āexplodeā as we render at full resolution and not downscaled as in darkroom. Now your device has to handle that data. But as the requirements are high it has to tile and does so by using the size of CL memory that is reserved for darktable or no tiling is required but for sure in your case it uses all. But your OS or Firefox are using graphics and memory too. Bang - darktable allocates graphics memory it wonāt get and the code wonāt work. So it has to abort the opencl code and do a fallback to cpu code. Here A) the aborting takes time and B) you use the slow Cpu!
So the lesson would be: Never try to use more memory than being safe. Some years ago we were safe with a safety margin of 400MB, this is not true any more being on windows or Linux as both the OS and applications use more graphics memory.
That param has sections for each of darktableās pipelines, separated by /. The pipelines, in order, are:
image (the central view of the darkroom),
preview (I assume on the lighttable, and maybe also the navigation view in the darkroom?)
export
thumbnail (on lighttable, and also on the filmstrip)
preview2 (a second preview can be shown on a 2nd display, for multi-monitor setups)
!0 means āany device but the one with ID = 0ā * means āany device at allā +0 would mean āmandatorily on a GPUā; then you list the device IDs, in the example case only ID = 0. If all listed devices are busy processing something else, wait (block) until one becomes free, or until the opencl_mandatory_timeout elapses, then fall back to CPU. The timeout is measured in units of 5 ms (donāt ask me why), so 200 would mean 1 second.