darktable 4 + Windows + OpenCL

My darktable instance (4.0.1 latest from git) does recognize my GPU, but doesn’t utilize it. It has been utilized earlier, I don’t recall what was the latest 3.9 version that had it utilized, but 3.9 from late May was visibly slower. I was on holidays the whole June, and didn’t touch darktable during that period. When I started to process my photos from the trip, I noticed the lagginess, and started to analyze it. My first step was to update to the latest version and run dt from the command line with debugging options.

Output from -d opencl is:

[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '0,*/!0,*/0,*/0,*'
[opencl_init] opencl_mandatory_timeout: 2000
[opencl_init] opencl_synch_cache: false
[opencl_init] opencl library 'OpenCL.dll' found on your system and loaded
[opencl_init] found 2 platforms
[opencl_init] found 2 devices

[dt_opencl_device_init]
   DEVICE:                   0: 'NVIDIA T500'
   CANONICAL NAME:           nvidiat500
   PLATFORM NAME & VENDOR:   NVIDIA CUDA, NVIDIA Corporation
   DRIVER VERSION:           516.59
   DEVICE VERSION:           OpenCL 3.0 CUDA, SM_20 SUPPORT
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          4096 MB
   MAX MEM ALLOC:            1024 MB
   MAX IMAGE SIZE:           32768 x 32768
   MAX WORK GROUP SIZE:      1024
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 64 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   YES
   MEMORY TUNING:            YES
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      128
   PERFORMANCE:              0.359245 (CPU 0.108690)
   DEFAULT DEVICE:           NO
   KERNEL DIRECTORY:         C:\Program Files\darktable-dev\share\darktable\kernels
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   KERNEL LOADING TIME:       0.0317 sec

[dt_opencl_device_init]
   DEVICE:                   1: 'Intel(R) Iris(R) Xe Graphics'
   PLATFORM NAME & VENDOR:   Intel(R) OpenCL HD Graphics, Intel(R) Corporation
   DRIVER VERSION:           30.0.101.1404
   DEVICE VERSION:           OpenCL 3.0 NEO 
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          12988 MB
   MAX MEM ALLOC:            4096 MB
   MAX IMAGE SIZE:           16384 x 16384
   MAX WORK GROUP SIZE:      256
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 256 256 256 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   YES
   MEMORY TUNING:            YES
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      128
   PERFORMANCE:              0.000000 (CPU 0.108690)
   DEFAULT DEVICE:           NO
   *** marked as disabled ***
[opencl_init] OpenCL successfully initialized.
[opencl_init] here are the internal numbers and names of OpenCL devices available to darktable:
[opencl_init]		0	'NVIDIA T500'
[opencl_init] FINALLY: opencl is AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is ON.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	-1	0	0	-1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[opencl_synchronization_timeout] synchronization timeout set to 200
 [opencl_summary_statistics] device 'NVIDIA T500' (0): NOT utilized

Two things to be mentioned here immediately

  1. It would at least be interesting why your Intel device is “marked as disabled”, could you post the stuff that is “deleted as not relevant”
  2. Did you develop an image - open it in darkroom - or did an export before quitting dt? Otherwise there would be no utilizing :slight_smile:
  1. Edited my first post and added the deleted part.
  2. Sorry, I just opend and closed darktable with nothing in the darkroom, and got confused with the text “NOT utilzed”

When opening a folder in darkroom, and editing one image I get the following (after the sync timeout line):

66.644060 [pixelpipe_process] [thumbnail] using device 0
66.644119 [dt_opencl_check_device_available] use 3695MB (tunemem=ON, pinning=ON) on device `NVIDIA T500' id=0
66.653860 [pixelpipe_process] [thumbnail] using device -1
66.844597 [pixelpipe_process] [thumbnail] using device -1
67.473839 [pixelpipe_process] [thumbnail] using device 0
67.520676 [pixelpipe_process] [thumbnail] using device -1
67.719701 [pixelpipe_process] [thumbnail] using device 0
68.238006 [pixelpipe_process] [thumbnail] using device 0
68.470187 [pixelpipe_process] [thumbnail] using device 0
68.923579 [pixelpipe_process] [thumbnail] using device 0
68.974820 [pixelpipe_process] [thumbnail] using device -1
69.235065 [pixelpipe_process] [thumbnail] using device 0
69.940753 [pixelpipe_process] [thumbnail] using device 0
70.080035 [pixelpipe_process] [thumbnail] using device -1
70.913201 [pixelpipe_process] [thumbnail] using device 0
91.912677 [pixelpipe_process] [thumbnail] using device 0
94.723075 [pixelpipe_process] [full] using device 0
95.828365 [pixelpipe_process] [preview] using device -1
98.954519 [pixelpipe_process] [preview] using device -1
99.160203 [pixelpipe_process] [full] using device 0
101.960828 [pixelpipe_process] [preview/fast] using device -1
102.101874 [pixelpipe_process] [full] using device 0
103.627233 103.627247 [pixelpipe_process] [preview/fast] using device -1
[pixelpipe_process] [full] using device 0
105.569885 [pixelpipe_process] [preview/fast] using device -1
105.711955 [pixelpipe_process] [full] using device 0
107.938394 [pixelpipe_process] [preview] using device -1
108.263068 [pixelpipe_process] [full] using device 0
108.544335 [pixelpipe_process] [preview] using device -1
117.882804 [pixelpipe_process] [full] using device 0
118.173600 [pixelpipe_process] [preview] using device -1
124.470295 [pixelpipe_process] [thumbnail] using device 0
 [opencl_summary_statistics] device 'NVIDIA T500' (0): 2461 out of 2461 events were successful and 0 events lost. max event=489

I think my memory about the performance is just playing a trick to me, and the GPU is used as I have set in my preferences.