OK, so running this new build, +2363, it is noticeably slower than +1435, at around 1.6 sec for a pixelpipe run with a certain image and style applied, whereas the older build is around 0.5 sec on the same image/xmp.
However, if I replace the darktablerc file with the one from my backup when I was running +1435, the current version is speedy…
Could it be anything to do with this? It doesn’t show with the old darktablerc file in use.
One more point - I didn’t use last week’s build much as it had the WB bug which was an issue for me, but I did notice the slowness, as with the current one. Not sure about previous weeks as I’d been sitting on +1435 for a while.
Sorry about the multiple posts - hope this might help.
darktablerc.good.txt (52.5 KB)
darktablerc.slower.txt (52.5 KB)
Edit: This is the output of -d opencl when running the older, speedy darktablerc file:
========================================
version: darktable 4.3.0+2363~gfe574c4996
start: 2023:05:22 19:31:09
0.6899 [dt_get_sysresource_level] switched to 2 as `large'
0.6900 total mem: 16340MB
0.6900 mipmap cache: 2042MB
0.6900 available mem: 11170MB
0.6900 singlebuff: 255MB
0.6901 OpenCL tune mem: OFF
0.6901 OpenCL pinned: OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'very fast GPU'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 200
[opencl_init] opencl library 'OpenCL.dll' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 1 device
[dt_opencl_device_init]
DEVICE: 0: 'NVIDIA GeForce GTX 1650'
PLATFORM NAME & VENDOR: NVIDIA CUDA, NVIDIA Corporation
CANONICAL NAME: nvidiacudanvidiageforcegtx1650
DRIVER VERSION: 527.56
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU
GLOBAL MEM SIZE: 4096 MB
MAX MEM ALLOC: 1024 MB
MAX IMAGE SIZE: 32768 x 32768
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 64 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
MEMORY TUNING: NO
FORCED HEADROOM: 400
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH: 16
ROUNDUP HEIGHT: 16
CHECK EVENT HANDLES: 128
PERFORMANCE: 10.503
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: C:\Program Files\darktable4.3+2363\share\darktable\kernels
KERNEL DIRECTORY: C:\Users\User\AppData\Local\Microsoft\Windows\INetCache\darktable\cached_v1_kernels_for_NVIDIACUDANVIDIAGeForceGTX1650_52756
CL COMPILER OPTION: -cl-fast-relaxed-math
KERNEL LOADING TIME: 0.0594 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init] 0 'NVIDIA CUDA NVIDIA GeForce GTX 1650'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_synchronization_timeout] synchronization timeout set to 0
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_synchronization_timeout] synchronization timeout set to 0
6.3278 [dt_opencl_check_tuning] use 3248MB (tunemem=OFF, pinning=OFF) on device `NVIDIA CUDA NVIDIA GeForce GTX 1650' id=0
[opencl_summary_statistics] device 'NVIDIA CUDA NVIDIA GeForce GTX 1650' (0): 792 out of 792 events were successful and 0 events lost. max event=172
end: 2023:05:22 19:31:09
========================================
And ditto when running the newer slow one:
version: darktable 4.3.0+2363~gfe574c4996
start: 2023:05:22 19:33:35
0.7852 [dt_get_sysresource_level] switched to 2 as `large'
0.7852 total mem: 16340MB
0.7853 mipmap cache: 2042MB
0.7853 available mem: 11170MB
0.7853 singlebuff: 255MB
0.7853 OpenCL tune mem: OFF
0.7853 OpenCL pinned: OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 200
[opencl_init] opencl library 'OpenCL.dll' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 1 device
[dt_opencl_device_init]
DEVICE: 0: 'NVIDIA GeForce GTX 1650'
PLATFORM NAME & VENDOR: NVIDIA CUDA, NVIDIA Corporation
CANONICAL NAME: nvidiacudanvidiageforcegtx1650
DRIVER VERSION: 527.56
DEVICE VERSION: OpenCL 3.0 CUDA, SM_20 SUPPORT
DEVICE_TYPE: GPU
GLOBAL MEM SIZE: 4096 MB
MAX MEM ALLOC: 1024 MB
MAX IMAGE SIZE: 32768 x 32768
MAX WORK GROUP SIZE: 1024
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 64 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
MEMORY TUNING: NO
FORCED HEADROOM: 400
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH: 16
ROUNDUP HEIGHT: 16
CHECK EVENT HANDLES: 128
PERFORMANCE: 1.368
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: C:\Program Files\darktable4.3+2363\share\darktable\kernels
KERNEL DIRECTORY: C:\Users\User\AppData\Local\Microsoft\Windows\INetCache\darktable\cached_v1_kernels_for_NVIDIACUDANVIDIAGeForceGTX1650_52756
CL COMPILER OPTION: -cl-fast-relaxed-math
KERNEL LOADING TIME: 0.0486 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init] 0 'NVIDIA CUDA NVIDIA GeForce GTX 1650'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 -1 0 0 -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 -1 0 0 -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 200
6.3762 [dt_opencl_check_tuning] use 3248MB (tunemem=OFF, pinning=OFF) on device `NVIDIA CUDA NVIDIA GeForce GTX 1650' id=0
[opencl_summary_statistics] device 'NVIDIA CUDA NVIDIA GeForce GTX 1650' (0): 471 out of 471 events were successful and 0 events lost. max event=172
end: 2023:05:22 19:33:35
========================================
EDIT again>
Well, I seem to have had the problem staring me in the face - for some reason this
[opencl_init] opencl_scheduling_profile: 'very fast GPU'
was this
[opencl_init] opencl_scheduling_profile: 'default'
And after changing it I’m back up to speed… I think. Too tired to be 100% sure I’m not missing something else, but looks about right. Sorry for the noise - I can delete the posts if it would be prefered!