darktable windows insider program 5/21

Here’s the link to the latest weekly build of darktable 4.3, darktable-4.3.0+2363~gfe574c4996-win64.exe - Google Drive. The list of latest changes is at Comparing 0b302ad76f…fe574c4996 darktable-org/darktable · GitHub

This is the first build after feature freeze, so please check it out for any bugs so that we can identify and squash them prior to release.

This is also the first build on my new build system. I’ve switched to Win 10 as the build platform from Win 7, so please check for any issues.

4 Likes

Pleased to see the WB module bug is fixed.
However, I’m getting a noticeable drop in processing speed, relative to 4.3 +1435. With the same image, using -d perf, I get about 0.5sec for the pipeline on the older build, and 1.5sec on the new one.

Here’s the log, for two sessions, one with each build. The last pixelpipe run on each session should be the same…
darktable-log.txt (87.9 KB)
I need to confirm, but it seemed like after using the new build, the old build was also slow, until I removed the darktablerc file and replaced with a backup.

Well, I don’t know what that was all about… :man_shrugging:

I just tried again, after dinner, and both old and new are running fine…

EDIT… no, that was after I’d replaced the darktablerc file with the old one to test +1435.
After replacing it with the post-2363 darktablerc file +2363 was slow again!

I’m just about to try with the kernels removed…

OK, so running this new build, +2363, it is noticeably slower than +1435, at around 1.6 sec for a pixelpipe run with a certain image and style applied, whereas the older build is around 0.5 sec on the same image/xmp.

However, if I replace the darktablerc file with the one from my backup when I was running +1435, the current version is speedy…

Could it be anything to do with this? It doesn’t show with the old darktablerc file in use.

One more point - I didn’t use last week’s build much as it had the WB bug which was an issue for me, but I did notice the slowness, as with the current one. Not sure about previous weeks as I’d been sitting on +1435 for a while.

Sorry about the multiple posts - hope this might help.
darktablerc.good.txt (52.5 KB)
darktablerc.slower.txt (52.5 KB)

Edit: This is the output of -d opencl when running the older, speedy darktablerc file:

========================================
version: darktable 4.3.0+2363~gfe574c4996
start: 2023:05:22 19:31:09

     0.6899 [dt_get_sysresource_level] switched to 2 as `large'
     0.6900   total mem:       16340MB
     0.6900   mipmap cache:    2042MB
     0.6900   available mem:   11170MB
     0.6900   singlebuff:      255MB
     0.6901   OpenCL tune mem: OFF
     0.6901   OpenCL pinned:   OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'very fast GPU'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 200
[opencl_init] opencl library 'OpenCL.dll' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'NVIDIA GeForce GTX 1650'
   PLATFORM NAME & VENDOR:   NVIDIA CUDA, NVIDIA Corporation
   CANONICAL NAME:           nvidiacudanvidiageforcegtx1650
   DRIVER VERSION:           527.56
   DEVICE VERSION:           OpenCL 3.0 CUDA, SM_20 SUPPORT
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          4096 MB
   MAX MEM ALLOC:            1024 MB
   MAX IMAGE SIZE:           32768 x 32768
   MAX WORK GROUP SIZE:      1024
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 64 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   MEMORY TUNING:            NO
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      128
   PERFORMANCE:              10.503
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   C:\Program Files\darktable4.3+2363\share\darktable\kernels
   KERNEL DIRECTORY:         C:\Users\User\AppData\Local\Microsoft\Windows\INetCache\darktable\cached_v1_kernels_for_NVIDIACUDANVIDIAGeForceGTX1650_52756
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   KERNEL LOADING TIME:       0.0594 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init]		0	'NVIDIA CUDA NVIDIA GeForce GTX 1650'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		1	1	1	1	1
[opencl_synchronization_timeout] synchronization timeout set to 0
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		1	1	1	1	1
[opencl_synchronization_timeout] synchronization timeout set to 0
     6.3278 [dt_opencl_check_tuning] use 3248MB (tunemem=OFF, pinning=OFF) on device `NVIDIA CUDA NVIDIA GeForce GTX 1650' id=0
 [opencl_summary_statistics] device 'NVIDIA CUDA NVIDIA GeForce GTX 1650' (0): 792 out of 792 events were successful and 0 events lost. max event=172

end:   2023:05:22 19:31:09
========================================

And ditto when running the newer slow one:

version: darktable 4.3.0+2363~gfe574c4996
start: 2023:05:22 19:33:35

     0.7852 [dt_get_sysresource_level] switched to 2 as `large'
     0.7852   total mem:       16340MB
     0.7853   mipmap cache:    2042MB
     0.7853   available mem:   11170MB
     0.7853   singlebuff:      255MB
     0.7853   OpenCL tune mem: OFF
     0.7853   OpenCL pinned:   OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 200
[opencl_init] opencl library 'OpenCL.dll' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'NVIDIA GeForce GTX 1650'
   PLATFORM NAME & VENDOR:   NVIDIA CUDA, NVIDIA Corporation
   CANONICAL NAME:           nvidiacudanvidiageforcegtx1650
   DRIVER VERSION:           527.56
   DEVICE VERSION:           OpenCL 3.0 CUDA, SM_20 SUPPORT
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          4096 MB
   MAX MEM ALLOC:            1024 MB
   MAX IMAGE SIZE:           32768 x 32768
   MAX WORK GROUP SIZE:      1024
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 64 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   MEMORY TUNING:            NO
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      128
   PERFORMANCE:              1.368
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   C:\Program Files\darktable4.3+2363\share\darktable\kernels
   KERNEL DIRECTORY:         C:\Users\User\AppData\Local\Microsoft\Windows\INetCache\darktable\cached_v1_kernels_for_NVIDIACUDANVIDIAGeForceGTX1650_52756
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   KERNEL LOADING TIME:       0.0486 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init]		0	'NVIDIA CUDA NVIDIA GeForce GTX 1650'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	-1	0	0	-1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	-1	0	0	-1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[opencl_synchronization_timeout] synchronization timeout set to 200
     6.3762 [dt_opencl_check_tuning] use 3248MB (tunemem=OFF, pinning=OFF) on device `NVIDIA CUDA NVIDIA GeForce GTX 1650' id=0
 [opencl_summary_statistics] device 'NVIDIA CUDA NVIDIA GeForce GTX 1650' (0): 471 out of 471 events were successful and 0 events lost. max event=172

end:   2023:05:22 19:33:35
========================================

EDIT again>
Well, I seem to have had the problem staring me in the face - for some reason this

[opencl_init] opencl_scheduling_profile: 'very fast GPU'

was this

[opencl_init] opencl_scheduling_profile: 'default'

And after changing it I’m back up to speed… I think. Too tired to be 100% sure I’m not missing something else, but looks about right. Sorry for the noise - I can delete the posts if it would be prefered!

What would be new in this version?

I have installed it and am finding the DT more responsive.

The second link in the original post has a list of all the changes