For testing purposes, knowing that diffuse or sharpen is computation heavy, I set some rather extreme values to make sure it’d take a while. NVidia 1060 + Ryzen 5 5600X.
I wanted to force darktable to use the GPU whenever possible. Lines from the log at start-up (some lines omitted for brevity):
0.058074 [opencl_init] opencl_scheduling_profile: 'default'
0.058084 [opencl_init] opencl_device_priority: '+0/+0/+0/+0/+0'
0.086857 [opencl_init] found 1 device
[opencl_init] device 0: NVIDIA GeForce GTX 1060 6GB
... lots of omitted lines...
0.193971 [opencl_priorities] these are your device priorities:
0.193972 [opencl_priorities] image preview export thumbs preview2
0.193975 [opencl_priorities] 0 0 0 0 0
0.193977 [opencl_priorities] show if opencl use is mandatory for a given pixelpipe:
0.193978 [opencl_priorities] image preview export thumbs preview2
0.193980 [opencl_priorities] 1 1 1 1 1
0.193986 [opencl_synchronization_timeout] synchronization timeout set to 200
OpenCL config from darktablerc
:
opencl=TRUE
opencl_async_pixelpipe=true
opencl_avoid_atomics=false
opencl_checksum=3316763402
opencl_device_priority=+0/+0/+0/+0/+0
opencl_disable_drivers_blacklist=false
opencl_library=
opencl_mandatory_timeout=200
opencl_memory_headroom=800
opencl_memory_requirement=768
opencl_micro_nap=0
opencl_number_event_handles=1000
opencl_scheduling_profile=default
opencl_size_roundup=16
opencl_synch_cache=active module
opencl_use_cpu_devices=false
opencl_use_pinned_memory=false
Despite these settings and mandatory GPU processing for everything, the CPU is used for the full pipeline; the GPU is only used for the preview:
248.904512 [dev] took 0.000 secs (0.000 CPU) to load the image.
248.926537 [pixelpipe_process] [preview] using device 0
249.961895 [pixelpipe_process] [full] using device -1
250.138825 [histogram] took 0.003 secs (0.047 CPU) scope draw
257.417738 [dev_pixelpipe] took 8.491 secs (61.711 CPU) processed `diffuse or sharpen' on GPU, blended on GPU [preview]
257.419696 [dev_pixelpipe] took 0.002 secs (0.002 CPU) processed `color balance rgb' on GPU, blended on GPU [preview]
257.489168 [dev_pixelpipe] took 0.069 secs (0.587 CPU) processed `filmic rgb' on GPU, blended on GPU [preview]
image colorspace transform RGB-->Lab took 0.002 secs (0.027 CPU) [colorout ]
257.523707 [dev_pixelpipe] took 0.035 secs (0.364 CPU) processed `output color profile' on CPU, blended on CPU [preview]
257.526662 [dev_pixelpipe] took 0.003 secs (0.026 CPU) processed `display encoding' on CPU, blended on CPU [preview]
image colorspace transform RGB-->RGB took 0.050 secs (0.542 lcms2) [final histogram]
257.592694 [histogram] took 0.066 secs (0.669 CPU) final rgb parade
257.592825 [opencl_profiling] profiling device 0 ('NVIDIA GeForce GTX 1060 6GB'):
257.592831 [opencl_profiling] spent 0.0027 seconds in [Write Image (from host to device)]
257.592836 [opencl_profiling] spent 3.4426 seconds in diffuse_blur_bspline
257.592839 [opencl_profiling] spent 4.9918 seconds in diffuse_pde
257.592842 [opencl_profiling] spent 0.0006 seconds in blendop_mask_rgb_jzczhz
257.592846 [opencl_profiling] spent 0.0003 seconds in [Copy Image (on device)]
257.592848 [opencl_profiling] spent 0.0004 seconds in blendop_rgb_jzczhz
257.592850 [opencl_profiling] spent 0.0092 seconds in [Read Image (from device to host)]
257.592852 [opencl_profiling] spent 0.0004 seconds in colorbalancergb
257.592854 [opencl_profiling] spent 0.0002 seconds in filmic_mask_clipped_pixels
257.592856 [opencl_profiling] spent 0.0003 seconds in filmic_inpaint_noise
257.592858 [opencl_profiling] spent 0.0009 seconds in init_reconstruct
257.592860 [opencl_profiling] spent 0.0114 seconds in blur_2D_Bspline_horizontal
257.592861 [opencl_profiling] spent 0.0165 seconds in blur_2D_Bspline_vertical
257.592863 [opencl_profiling] spent 0.0112 seconds in wavelets_detail_level
257.592865 [opencl_profiling] spent 0.0146 seconds in wavelets_reconstruct
257.592867 [opencl_profiling] spent 0.0006 seconds in compute_ratios
257.592869 [opencl_profiling] spent 0.0008 seconds in restore_ratios
257.592871 [opencl_profiling] spent 0.0003 seconds in filmicrgb_chroma
257.592873 [opencl_profiling] spent 8.5049 seconds totally in command queue (with 0 events missing)
257.593761 [dev_process_preview] pixel pipeline processing took 8.689 secs (63.394 CPU)
257.613740 [histogram] took 0.003 secs (0.043 CPU) scope draw
280.329631 [dev_pixelpipe] took 30.368 secs (311.416 CPU) processed `diffuse or sharpen' on CPU, blended on CPU [full]
280.376793 [dev_pixelpipe] took 0.047 secs (0.530 CPU) processed `color balance rgb' on CPU, blended on CPU [full]
280.555335 [dev_pixelpipe] took 0.179 secs (1.640 CPU) processed `filmic rgb' on CPU, blended on CPU [full]
image colorspace transform RGB-->Lab took 0.001 secs (0.009 CPU) [colorout ]
280.580039 [dev_pixelpipe] took 0.025 secs (0.202 CPU) processed `output color profile' on CPU, blended on CPU [full]
280.582331 [dev_pixelpipe] took 0.002 secs (0.017 CPU) processed `display encoding' on CPU, blended on CPU [full]
280.582731 [dev_process_image] pixel pipeline processing took 31.633 secs (314.555 CPU)
Do I misunderstand the configuration?
The documentation says (darktable 3.8 user manual - multiple devices):
If a pixelpipe process is about to be started and all GPUs in the corresponding group are busy, darktable automatically processes the image on the CPU by default. You can enforce GPU processing by prefixing the list of allowed GPUs with a plus sign
+
. In this case darktable will not use the CPU but rather suspend processing until the next permitted OpenCL device is available.
Therefore, I’d expect all processing to be done on the GPU.
darktable 3.9.0+69~gb72c7d7ba, Release build, Linux.