Darktable - Windows performance

Stack? It is just a RAW picture and .XMP with set of Darktable edits to be applied using darktable-cli using either CPU-only or CPU+GPU set as processors.

Processing stack, the history in the XMP.

Ah these. Just looked through XMP and it seems many modules there are rather current:
hazeremoval
exposure
flip
colorbalancergb
Still a good option to benchmark.

diffuse or sharpen is one that’s missing, and one that’s both heavy-weight and part of my standard toolchain.

2 Likes

OK, finally got new Radeon Pro W6600 on sale for less than 250$. Not much improvement, my benchmark shows slightly over 6 seconds to complete. And @g-man was right - it is better to have whole system upgraded, as GPU is just one part of whole processing path, CPU still has its role and its power should match GPU. And I’ll do it in a few months, I plan to build Ryzen 9 7900 based system. Well at least my new card is now a first element of this future setup and it is much more energy efficient :slight_smile: Thanks all for replies!

With OpenCL:
darktable-cli setubal.orf setubal.orf.xmp test.jpg --core -d perf -d opencl: [dev_process_export] pixel pipeline processing took 5.563 secs (13.421 CPU)

Without:
darktable-cli setubal.orf setubal.orf.xmp test.jpg --core -d perf -d opencl --disable-opencl: [dev_process_export] pixel pipeline processing took 13.078 secs (127.521 CPU)

In GPU-compute, your card should be about twice as fast as mine.

            Radeon PRO W6600   GeForce GTX 1060
GPU Compute	9895 Ops/Sec	   4322 Ops/Sec (-56.3%)

(Radeon PRO W6600 vs GeForce GTX 1060 [videocardbenchmark.net] by PassMark Software)

My OpenCL logs
2.2212 [dt_dev_load_raw] loading the image. took 0.587 secs (0.563 CPU)
2.2789 [export] creating pixelpipe took 0.055 secs (0.398 CPU)
2.2790 [dt_opencl_check_tuning] use 4808MB (headroom=OFF, pinning=OFF) on device `NVIDIA CUDA NVIDIA GeForce GTX 1060 6GB' id=0
2.2793 [dev_pixelpipe] took 0.000 secs (0.000 CPU) initing base buffer [export]
2.2934 [dev_pixelpipe] took 0.014 secs (0.065 CPU) [export] processed `rawprepare' on GPU, blended on GPU
2.2990 [dev_pixelpipe] took 0.006 secs (0.002 CPU) [export] processed `temperature' on GPU, blended on GPU
2.3266 [dev_pixelpipe] took 0.028 secs (0.023 CPU) [export] processed `highlights' on GPU, blended on GPU
2.4592 [dev_pixelpipe] took 0.133 secs (0.127 CPU) [export] processed `hotpixels' on CPU, blended on CPU
2.5928 [dev_pixelpipe] took 0.134 secs (0.143 CPU) [export] processed `demosaic' on GPU, blended on GPU
3.9984 [dev_pixelpipe] took 1.406 secs (0.866 CPU) [export] processed `denoiseprofile' on GPU with tiling, blended on CPU
4.5705 [dev_pixelpipe] took 0.572 secs (1.564 CPU) [export] processed `lens' on GPU, blended on GPU
4.6047 [dev_pixelpipe] took 0.034 secs (0.029 CPU) [export] processed `ashift' on GPU, blended on GPU
4.6263 [dev_pixelpipe] took 0.022 secs (0.017 CPU) [export] processed `exposure' on GPU, blended on GPU
4.6620 [dev_pixelpipe] took 0.036 secs (0.027 CPU) [export] processed `colorin' on GPU, blended on GPU
4.6827 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.020 secs (0.013 GPU) [channelmixerrgb]
4.7277 [dev_pixelpipe] took 0.066 secs (0.046 CPU) [export] processed `channelmixerrgb' on GPU, blended on GPU
4.8807 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0.064 secs (0.657 CPU) [atrous]
5.9407 [dev_pixelpipe] took 1.213 secs (1.770 CPU) [export] processed `atrous' on GPU with tiling, blended on CPU
6.0632 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.012 secs (0.012 GPU) [colorbalancergb]
6.1078 [dev_pixelpipe] took 0.167 secs (0.146 CPU) [export] processed `colorbalancergb' on GPU, blended on GPU
6.1423 [dev_pixelpipe] took 0.034 secs (0.022 CPU) [export] processed `rgblevels' on GPU, blended on GPU
6.1713 [dev_pixelpipe] took 0.029 secs (0.020 CPU) [export] processed `sigmoid' on GPU, blended on GPU
6.3214 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0.059 secs (0.645 CPU) [bilat]
7.6203 [dev_pixelpipe] took 1.449 secs (8.338 CPU) [export] processed `bilat' on CPU, blended on CPU
7.7289 [dev_pixelpipe] took 0.108 secs (0.108 CPU) [export] processed `colorout' on GPU, blended on GPU
7.7331 [resample_cl] took 0.004 secs (0.000 CPU) 1:1 copy/crop of 8065x6046 pixels
7.7505 [dev_pixelpipe] took 0.022 secs (0.017 CPU) [export] processed `finalscale' on GPU, blended on GPU
7.8418 [opencl_profiling] profiling device 0 ('NVIDIA CUDA NVIDIA GeForce GTX 1060 6GB'):
7.8418 [opencl_profiling] spent  0.5348 seconds in [Write Image (from host to device)]
7.8418 [opencl_profiling] spent  0.0026 seconds in rawprepare_1f
7.8418 [opencl_profiling] spent  0.0031 seconds in whitebalance_1f
7.8418 [opencl_profiling] spent  0.0025 seconds in highlights_initmask
7.8418 [opencl_profiling] spent  0.0033 seconds in highlights_dilatemask
7.8418 [opencl_profiling] spent  0.1928 seconds in [Write Buffer (from host to device)]
7.8418 [opencl_profiling] spent  0.0075 seconds in highlights_chroma
7.8418 [opencl_profiling] spent  0.0000 seconds in [Read Buffer (from device to host)]
7.8418 [opencl_profiling] spent  0.0063 seconds in highlights_opposed
7.8418 [opencl_profiling] spent  1.0297 seconds in [Read Image (from device to host)]
7.8418 [opencl_profiling] spent  0.0008 seconds in border_interpolate
7.8418 [opencl_profiling] spent  0.0060 seconds in rcd_border_green
7.8418 [opencl_profiling] spent  0.0107 seconds in rcd_border_redblue
7.8418 [opencl_profiling] spent  0.0074 seconds in rcd_populate
7.8418 [opencl_profiling] spent  0.0052 seconds in rcd_step_1_1
7.8418 [opencl_profiling] spent  0.0040 seconds in rcd_step_1_2
7.8418 [opencl_profiling] spent  0.0025 seconds in rcd_step_2_1
7.8418 [opencl_profiling] spent  0.0065 seconds in rcd_step_3_1
7.8418 [opencl_profiling] spent  0.0037 seconds in rcd_step_4_1
7.8418 [opencl_profiling] spent  0.0020 seconds in rcd_step_4_2
7.8418 [opencl_profiling] spent  0.0058 seconds in rcd_step_5_1
7.8418 [opencl_profiling] spent  0.0093 seconds in rcd_step_5_2
7.8419 [opencl_profiling] spent  0.0099 seconds in rcd_write_output
7.8419 [opencl_profiling] spent  0.0118 seconds in denoiseprofile_precondition_Y0U0V0
7.8419 [opencl_profiling] spent  0.4297 seconds in denoiseprofile_decompose
7.8419 [opencl_profiling] spent  0.0418 seconds in denoiseprofile_reduce_first
7.8419 [opencl_profiling] spent  0.0002 seconds in denoiseprofile_reduce_second
7.8419 [opencl_profiling] spent  0.1217 seconds in denoiseprofile_synthesize
7.8419 [opencl_profiling] spent  0.0659 seconds in [Copy Image (on device)]
7.8419 [opencl_profiling] spent  0.0119 seconds in denoiseprofile_backtransform_Y0U0V0
7.8419 [opencl_profiling] spent  0.0176 seconds in lens_vignette
7.8419 [opencl_profiling] spent  0.0550 seconds in lens_distort_bicubic
7.8419 [opencl_profiling] spent  0.0261 seconds in ashift_bicubic
7.8419 [opencl_profiling] spent  0.0169 seconds in exposure
7.8419 [opencl_profiling] spent  0.0191 seconds in colorin_unbound
7.8419 [opencl_profiling] spent  0.0269 seconds in colorspaces_transform_lab_to_rgb_matrix
7.8419 [opencl_profiling] spent  0.0150 seconds in channelmixerrgb_CAT16
7.8419 [opencl_profiling] spent  0.6065 seconds in eaw_decompose
7.8419 [opencl_profiling] spent  0.1499 seconds in eaw_synthesize
7.8419 [opencl_profiling] spent  0.0180 seconds in colorbalancergb
7.8419 [opencl_profiling] spent  0.0147 seconds in rgblevels
7.8419 [opencl_profiling] spent  0.0215 seconds in sigmoid_loglogistic_per_channel
7.8419 [opencl_profiling] spent  0.0223 seconds in colorout
7.8419 [opencl_profiling] spent  3.5489 seconds totally in command queue (with 0 events missing)
7.8419 [dev_process_export] pixel pipeline processing took 5.563 secs (13.421 CPU)

Did you see excessive tiling, or other issues in your logs? I had a little bit with my GPU (in denoiseprofiledenoise (profiled) and atrouscontrast equalizer). What’s your darktable resource setting? In another benchmark, there was quite a bit of difference (4.4 vs 6 seconds) between large and normal on my machine.

1 Like

This is my benchmark log. As you can see my new GPU is slower than yours for some reason. In all (DirectX and OpenCL) benchmarks it is roughly 2 times faster than GTX1060 or RX570. I don’t know, maybe I miss some OpenCL optimizations in darktablerc? Regarding resources I set “Very fast GPU” and “Use all device memory” (W6600 has 8 GB RAM). I don’t know where I can check if there was some tiling already during processing. I’ll see this other benchmark you mentioned.

2,4995 [dt_dev_load_raw] loading the image. took 0,759 secs (0,719 CPU)
2,6760 [export] creating pixelpipe took 0,165 secs (0,156 CPU)
2,6762 [dt_opencl_check_tuning] use 7576MB (headroom=ON, pinning=OFF) on device AMD Accelerated Parallel Processing gfx1032’ id=0
2,6774 [dev_pixelpipe] took 0,000 secs (0,000 CPU) initing base buffer [export]
2,7627 [dev_pixelpipe] took 0,085 secs (0,000 CPU) [export] processed rawprepare’ on GPU, blended on GPU
2,7927 [dev_pixelpipe] took 0,030 secs (0,000 CPU) [export] processed temperature’ on GPU, blended on GPU
2,8077 [dev_pixelpipe] took 0,015 secs (0,000 CPU) [export] processed highlights’ on GPU, blended on GPU
2,9170 [dev_pixelpipe] took 0,109 secs (0,016 CPU) [export] processed hotpixels’ on CPU, blended on CPU
3,2210 [dev_pixelpipe] took 0,304 secs (0,000 CPU) [export] processed demosaic’ on GPU, blended on GPU
4,9699 [dev_pixelpipe] took 1,749 secs (0,000 CPU) [export] processed denoiseprofile’ on GPU, blended on GPU
5,9107 [dev_pixelpipe] took 0,941 secs (2,172 CPU) [export] processed lens’ on GPU, blended on GPU
5,9326 [dev_pixelpipe] took 0,022 secs (0,000 CPU) [export] processed ashift’ on GPU, blended on GPU
5,9471 [dev_pixelpipe] took 0,014 secs (0,000 CPU) [export] processed exposure’ on GPU, blended on GPU
5,9641 [dev_pixelpipe] took 0,017 secs (0,000 CPU) [export] processed colorin’ on GPU, blended on GPU
5,9714 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB–>IOP_CS_RGB took 0,003 secs (0,000 GPU) [channelmixerrgb]
6,0511 [dev_pixelpipe] took 0,087 secs (0,000 CPU) [export] processed channelmixerrgb’ on GPU, blended on GPU
6,0648 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB–>IOP_CS_LAB took 0,003 secs (0,000 GPU) [atrous]
7,8978 [dev_pixelpipe] took 1,847 secs (0,000 CPU) [export] processed atrous’ on GPU, blended on GPU
7,9117 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB–>IOP_CS_RGB took 0,003 secs (0,000 GPU) [colorbalancergb]
8,0028 [dev_pixelpipe] took 0,105 secs (0,000 CPU) [export] processed colorbalancergb’ on GPU, blended on GPU
8,0159 [dev_pixelpipe] took 0,013 secs (0,000 CPU) [export] processed rgblevels’ on GPU, blended on GPU
8,0300 [dev_pixelpipe] took 0,014 secs (0,000 CPU) [export] processed sigmoid’ on GPU, blended on GPU
8,0392 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB–>IOP_CS_LAB took 0,003 secs (0,000 GPU) [bilat]
8,7702 [dev_pixelpipe] took 0,740 secs (0,000 CPU) [export] processed bilat’ on GPU, blended on GPU
8,7967 [dev_pixelpipe] took 0,026 secs (0,000 CPU) [export] processed colorout’ on GPU, blended on GPU
8,8014 [resample_cl] took 0,000 secs (0,000 CPU) 1:1 copy/crop of 8065x6046 pixels
8,8115 [dev_pixelpipe] took 0,015 secs (0,000 CPU) [export] processed `finalscale’ on GPU, blended on GPU
9,0524 [opencl_profiling] profiling device 0 (‘AMD Accelerated Parallel Processing gfx1032’):
9,0525 [opencl_profiling] spent 0,0553 seconds in [Write Image (from host to device)]
9,0525 [opencl_profiling] spent 0,0029 seconds in rawprepare_1f
9,0526 [opencl_profiling] spent 0,0141 seconds in whitebalance_1f
9,0526 [opencl_profiling] spent 0,0019 seconds in highlights_initmask
9,0527 [opencl_profiling] spent 0,0005 seconds in highlights_dilatemask
9,0527 [opencl_profiling] spent 0,3848 seconds in [Write Buffer (from host to device)]
9,0528 [opencl_profiling] spent 0,0024 seconds in highlights_chroma
9,0528 [opencl_profiling] spent 0,0004 seconds in [Read Buffer (from device to host)]
9,0528 [opencl_profiling] spent 0,0029 seconds in highlights_opposed
9,0529 [opencl_profiling] spent 0,1666 seconds in [Read Image (from device to host)]
9,0529 [opencl_profiling] spent 0,0005 seconds in border_interpolate
9,0530 [opencl_profiling] spent 0,0020 seconds in rcd_border_green
9,0530 [opencl_profiling] spent 0,0039 seconds in rcd_border_redblue
9,0531 [opencl_profiling] spent 0,0049 seconds in rcd_populate
9,0531 [opencl_profiling] spent 0,0032 seconds in rcd_step_1_1
9,0531 [opencl_profiling] spent 0,0031 seconds in rcd_step_1_2
9,0532 [opencl_profiling] spent 0,0015 seconds in rcd_step_2_1
9,0532 [opencl_profiling] spent 0,0038 seconds in rcd_step_3_1
9,0533 [opencl_profiling] spent 0,0021 seconds in rcd_step_4_1
9,0533 [opencl_profiling] spent 0,0016 seconds in rcd_step_4_2
9,0533 [opencl_profiling] spent 0,0039 seconds in rcd_step_5_1
9,0534 [opencl_profiling] spent 0,0066 seconds in rcd_step_5_2
9,0534 [opencl_profiling] spent 0,0083 seconds in rcd_write_output
9,0535 [opencl_profiling] spent 0,0176 seconds in denoiseprofile_precondition_Y0U0V0
9,0535 [opencl_profiling] spent 0,3049 seconds in denoiseprofile_decompose
9,0535 [opencl_profiling] spent 0,2378 seconds in denoiseprofile_reduce_first
9,0536 [opencl_profiling] spent 0,0001 seconds in denoiseprofile_reduce_second
9,0536 [opencl_profiling] spent 0,2835 seconds in denoiseprofile_synthesize
9,0537 [opencl_profiling] spent 0,0790 seconds in [Copy Image (on device)]
9,0537 [opencl_profiling] spent 0,0089 seconds in denoiseprofile_backtransform_Y0U0V0
9,0537 [opencl_profiling] spent 0,0125 seconds in lens_vignette
9,0538 [opencl_profiling] spent 0,0272 seconds in lens_distort_bicubic
9,0538 [opencl_profiling] spent 0,0091 seconds in ashift_bicubic
9,0539 [opencl_profiling] spent 0,0087 seconds in exposure
9,0539 [opencl_profiling] spent 0,0087 seconds in colorin_unbound
9,0540 [opencl_profiling] spent 0,0175 seconds in colorspaces_transform_lab_to_rgb_matrix
9,0540 [opencl_profiling] spent 0,0087 seconds in channelmixerrgb_CAT16
9,0540 [opencl_profiling] spent 0,0208 seconds in colorspaces_transform_rgb_matrix_to_lab
9,0541 [opencl_profiling] spent 0,3545 seconds in eaw_decompose
9,0542 [opencl_profiling] spent 0,3404 seconds in eaw_synthesize
9,0542 [opencl_profiling] spent 0,0084 seconds in colorbalancergb
9,0542 [opencl_profiling] spent 0,0084 seconds in rgblevels
9,0543 [opencl_profiling] spent 0,0087 seconds in sigmoid_loglogistic_per_channel
9,0543 [opencl_profiling] spent 0,0065 seconds in pad_input
9,0544 [opencl_profiling] spent 0,0350 seconds in gauss_reduce
9,0544 [opencl_profiling] spent 0,0304 seconds in process_curve
9,0545 [opencl_profiling] spent 0,2108 seconds in laplacian_assemble
9,0545 [opencl_profiling] spent 0,0095 seconds in write_back
9,0545 [opencl_profiling] spent 0,0091 seconds in colorout
9,0546 [opencl_profiling] spent 2,7437 seconds totally in command queue (with 0 events missing)
9,0546 [dev_process_export] pixel pipeline processing took 6,379 secs (2,188 CPU)

And all informations before log (sorry, I don’t know how to create such a nice collapsible code parts using markup like you did).

darktable 4.6.1
Copyright (C) 2012-2024 Johannes Hanika and other contributors.
Compile options:
Bit depth → 64 bit
Debug → DISABLED
SSE2 optimizations → ENABLED
OpenMP → ENABLED
OpenCL → ENABLED
Lua → ENABLED - API version 9.2.0
Colord → DISABLED
gPhoto2 → ENABLED
GMIC → ENABLED - Compressed LUTs are supported
GraphicsMagick → ENABLED
ImageMagick → DISABLED
libavif → ENABLED
libheif → ENABLED
libjxl → ENABLED
OpenJPEG → ENABLED
OpenEXR → ENABLED
WebP → ENABLED
See resources | darktable for detailed documentation.
See Sign in to GitHub · GitHub to report bugs.
(darktable-cli.exe:47544): Gtk-WARNING **: 21:58:17.086: gtk_disable_setlocale() must be called before gtk_init()
0,0881 [dt_get_sysresource_level] switched to 3 as `unrestricted’
0,0894 total mem: 16319MB
0,0901 mipmap cache: 2039MB
0,0908 available mem: 261116MB
0,0914 singlebuff: 16319MB
0.0950 [opencl_init] opencl library ‘OpenCL.dll’ found on your system and loaded, preference ‘default path’
0.1162 [opencl_init] found 1 platform
[opencl_init] found 1 device
[dt_opencl_device_init]
DEVICE: 0: ‘gfx1032’
PLATFORM, VENDOR & ID: AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc., ID=4098
CANONICAL NAME: amdacceleratedparallelprocessinggfx1032
DRIVER VERSION: 3608.0 (PAL,LC)
DEVICE VERSION: OpenCL 2.0 AMD-APP (3608.0)
DEVICE_TYPE: GPU, dedicated mem
GLOBAL MEM SIZE: 8176 MB
MAX MEM ALLOC: 6732 MB
MAX IMAGE SIZE: 16384 x 16384
MAX WORK GROUP SIZE: 256
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 1024 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
USE HEADROOM: 600Mb
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH & HEIGHT 16x16
CHECK EVENT HANDLES: 128
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: C:\Program Files\darktable\share\darktable\kernels
KERNEL DIRECTORY: C:\Users\sylwe\AppData\Local\Microsoft\Windows\INetCache\darktable\cached_v3_kernels_for_AMDAcceleratedParallelProcessinggfx1032_36080PALLC
CL COMPILER OPTION: -cl-fast-relaxed-math
CL COMPILER COMMAND: -w -cl-fast-relaxed-math -DAMD=1 -I"C:\Program Files\darktable\share\darktable\kernels"
KERNEL LOADING TIME: 0.0727 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init] 0 ‘AMD Accelerated Parallel Processing gfx1032’
0.7908 [opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: ‘very fast GPU’
[opencl_init] opencl_device_priority: ‘/!0,///!0,*’
[opencl_init] opencl_mandatory_timeout: 400
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_synchronization_timeout] synchronization timeout set to 0
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 1 1 1 1 1
[opencl_synchronization_timeout] synchronization timeout set to 0

That’s what I also see, but I don’t think a faster CPU would solve that; those are all GPU-only timings.

For example, denoiseprofile was faster on my GPU, using tiling, than on yours, without it:

mine:  3.9984 [dev_pixelpipe] took 1.406 secs (0.866 CPU) [export] processed `denoiseprofile' on GPU with tiling, blended on CPU
yours: 4,9699 [dev_pixelpipe] took 1,749 secs (0,000 CPU) [export] processed denoiseprofile' on GPU, blended on GPU

lens:

mine:  4.5705 [dev_pixelpipe] took 0.572 secs (1.564 CPU) [export] processed `lens' on GPU, blended on GPU
yours: 5,9107 [dev_pixelpipe] took 0,941 secs (2,172 CPU) [export] processed lens’ on GPU, blended on GPU

atrous:

mine:  5.9407 [dev_pixelpipe] took 1.213 secs (1.770 CPU) [export] processed `atrous' on GPU with tiling, blended on CPU
yours: 7,8978 [dev_pixelpipe] took 1,847 secs (0,000 CPU) [export] processed atrous' on GPU, blended on GPU

Then a surpsie, my bilat ran on the CPU, yours on the GPU:

mine:  7.6203 [dev_pixelpipe] took 1.449 secs (8.338 CPU) [export] processed `bilat' on CPU, blended on CPU
yours: 8,7702 [dev_pixelpipe] took 0,740 secs (0,000 CPU) [export] processed bilat' on GPU, blended on GPU

Transfers and copies – a mixed bag: sometimes (like the first and fourth pairs) yours is 7-10x faster; sometimes, like the 2nd pair, twice as slow:

mine:  7.8418 [opencl_profiling] spent 0.5348 seconds in [Write Image (from host to device)]
yours: 9,0525 [opencl_profiling] spent 0,0553 seconds in [Write Image (from host to device)]

mine:  7.8418 [opencl_profiling] spent 0.1928 seconds in [Write Buffer (from host to device)]
yours: 9,0527 [opencl_profiling] spent 0,3848 seconds in [Write Buffer (from host to device)]

mine:  7.8418 [opencl_profiling] spent 0.0000 seconds in [Read Buffer (from device to host)]
yours: 9,0528 [opencl_profiling] spent 0,0004 seconds in [Read Buffer (from device to host)]

mine:  7.8418 [opencl_profiling] spent 1.0297 seconds in [Read Image (from device to host)]
yours: 9,0529 [opencl_profiling] spent 0,1666 seconds in [Read Image (from device to host)]

mine:  7.8419 [opencl_profiling] spent 0.0659 seconds in [Copy Image (on device)]
yours: 9,0537 [opencl_profiling] spent 0,0790 seconds in [Copy Image (on device)]
1 Like

OK, did some tests on my son’s recently built PC with Ryzen 5 7600 andd RX 7600.
CPU only: 11.2 sec
CPU+GPU: 5.1 sec
It is slightly faster card than my W6600, but still this result is far from 2.8 seconds results of benchmark on slower CPU (Ryzen 7 2700x) and the same GPU (RX 7600) from mentioned by me list:
GPU benchmarks in darktable (dartmouth.edu)
I ran some OpenCL benchmarks using Compubench and turned out in most of operations my card is about 2x as fast as my previous RX 570. In Darktable difference is maybe 15%.
I tried also some older versions of AMD drivers with little changes. So to me it seems Windows version of DT is for some reason slower when using Open CL than on Linux. Test using only CPU are on par with similar hardwa running Linux. It could be also a problem with Windows 11, which I have on my and my son’s PCs, but mentioned OpenCL tests do not show this. In the free time I’ll try to install Ubuntu on another disk on my PC to do the same benchmarks.

It’s probably a driver issue, then. In darktable, the code is written in OpenCL, which is device-independent. When you launch darktable for the first time, or after upgrading the graphics driver, it asks the driver to build the driver-specific ‘kernel’ from the OpenCL source.

So maybe Windows OpenCL driver is off then? But as I mentioned I tried to clean uninstall and then install a few (much) older AMD drivers with little to no impact. On the other side - 6 vs 16 seconds I get now on my CPU is still a huge improvement. I’m curious however how would it be with Ryzen 9 7900, which achieves in these Darktable benchmarks similar results to my GPU? If OpenCL benchmarks would be the same as CPU, then what? :slight_smile:

This is old, but interesting.

Yes, interesting, although here either OpenCL benchmarks are (nearly) the same for both OSes or definitely won by Windows setup :slight_smile:

In the microphone scene, ROCm on Linux led to better performance than on Windows. The NVIDIA driver performance remained the same.

Not a dramatic difference, but a definitive win. But the important thing is that AMD drivers seemed to show different performance, while Nvidia was even.

… in 2018.

1 Like

Yep, that was 6 years ago, right now AMD should be even better than it was, especially with their commitment to open source drivers support.

And now:

1 Like

OK, no need to stay in 2018 :slight_smile: Just went through current cross-platform OpenCL benchmarks - Luxmark 3.1 and Geekbench 6.
Geekbench 6 - Cross-Platform Benchmark
Releases · LuxCoreRender/LuxMark (github.com)
And here are my results, if you’d find time and could run yours on Linux - that would be interesting comparison and we’d know whether differencies in DT OpenCL performance on both platforms are matter of driver or DT itself?


Zrzut ekranu 2024-06-28 141706
Zrzut ekranu 2024-06-28 160536
Zrzut ekranu 2024-06-28 161518

Do you mean I should run the tests on Linux and Nvidia?

Update:
geekbench: ASUS System Product Name - Geekbench

38061
OpenCL Score

Maybe there’s a sub-score that’s more relevant, though.

OpenCL Performance

OpenCL Score 38061
Background Blur 24686
102.2 images/sec
Face Detection 14376
46.9 images/sec
Horizon Detection 48778
1.52 Gpixels/sec
Edge Detection 59283
2.20 Gpixels/sec
Gaussian Blur 56620
2.47 Gpixels/sec
Feature Matching 6084
239.8 Mpixels/sec
Stereo Matching 134798
128.1 Gpixels/sec
Particle Physics 92422
4067.6 FPS

Luxmark did not run:

kofa@eagle:/tmp/luxmark-v3.1$ ./luxmark
./luxmark.bin: error while loading shared libraries: libglut.so.3: cannot open shared object file: No such file or directory
kofa@eagle:/tmp/luxmark-v3.1$ ./luxmark.bin 
./luxmark.bin: error while loading shared libraries: libembree.so.2: cannot open shared object file: No such file or directory

I do have libglut installed:

kofa@eagle:/tmp/luxmark-v3.1$ dpkg -S libglut.so.3
libglut3.12:amd64: /usr/lib/x86_64-linux-gnu/libglut.so.3.12
libglut3.12:amd64: /usr/lib/x86_64-linux-gnu/libglut.so.3.12.0
kofa@eagle:/tmp/luxmark-v3.1$ sudo apt install libglut3.12
[sudo] password for kofa: 
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libglut3.12 is already the newest version (3.4.0-1).
libglut3.12 set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 17 not upgraded.