Does opencl improve speed of setting parametric masks?

I guess I’ll re-run the AMD “detect and install everything appropriate” tool once more. Thing is, it assumes I’m gaming. So I have to go through all the options it wants to enable (color, etc.), try to figure out what does / doesn’t make sense for color-sensitive photo work, then disable / enable only what should be.

Oh well, at least it’s time to re-calibrate my displays so maybe I can just start over. :slight_smile:

And what Linux and what nvidia drivers…

Will endeavor to find a repeatable instance where opencl drops-out and will then provide the logs and other info. I’ll need to redo what I was doing.

(Unrelated) I’ve found another (different) scenario where opencl drops-out with some export parameter options for some of the file formats.

Would it be better to create a github issues and put the detail there?

Here you go. How do you like the log? In here

KERNEL BUILD DIRECTORY:   /home/kirk/programs/share/darktable/kernels
   KERNEL DIRECTORY:         /home/kirk/.cache/darktable/cached_v2_kernels_for_NVIDIACUDANVIDIAGeForceRTX3060Ti_53510405
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   KERNEL LOADING TIME:       0.0197 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init]           0       'NVIDIA CUDA NVIDIA GeForce RTX 3060 Ti'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           1       1       1       1       1
[opencl_synchronization_timeout] synchronization timeout set to 0
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           1       1       1       1       1
[opencl_synchronization_timeout] synchronization timeout set to 0
     1.7375 pixelpipe starting CL      [thumbnail]                           (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 device=0 (nvidiacudanvidiageforcertx3060ti)
     1.7376 [dt_opencl_check_tuning] use 7573MB (headroom=ON, pinning=OFF) on device `NVIDIA CUDA NVIDIA GeForce RTX 3060 Ti' id=0
     1.7376 modify roi IN              [thumbnail]    flip                   (   0/   0)  900x 573 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 
     1.7376 modify roi IN              [thumbnail]    ashift                 (   0/   0)  928x 619 scale=0.6897 --> (   0/   0)  900x 573 scale=0.6897 
     1.7376 modify roi IN              [thumbnail]    demosaic               (   0/   0) 1345x 897 scale=1.0000 --> (   0/   0)  928x 619 scale=0.6897 
     1.7376 modify roi IN              [thumbnail]    highlights             (   0/   0) 1347x 898 scale=1.0000 --> (   0/   0) 1345x 897 scale=1.0000 
     1.7376 modify roi IN              [thumbnail]    rawprepare             (   0/   0) 1349x 900 scale=1.0000 --> (   0/   0) 1347x 898 scale=1.0000 
     1.7377 pixelpipe data: full       [thumbnail]                           (   0/   0) 1349x 900 scale=1.0000 --> (   0/   0) 1349x 900 scale=1.0000 
     1.7384 pixelpipe process CL       [thumbnail]    rawprepare             (   0/   0) 1349x 900 scale=1.0000 --> (   0/   0) 1347x 898 scale=1.0000 IOP_CS_RAW
     1.7391 pixelpipe process CL       [thumbnail]    temperature            (   0/   0) 1347x 898 scale=1.0000 --> (   0/   0) 1347x 898 scale=1.0000 IOP_CS_RAW
     1.7397 pixelpipe process CL       [thumbnail]    highlights             (   0/   0) 1347x 898 scale=1.0000 --> (   0/   0) 1345x 897 scale=1.0000 IOP_CS_RAW
     1.7412 pixelpipe process CL       [thumbnail]    demosaic               (   0/   0) 1345x 897 scale=1.0000 --> (   0/   0)  928x 619 scale=0.6897 IOP_CS_RAW -> IOP_CS_RGB
     1.7444 clip_and_zoom_roi_cl       [thumbnail]    demosaic               (   0/   0) 1345x 897 scale=1.0000 --> (   0/   0)  928x 619 scale=0.6897 
     1.7462 pixelpipe process CL       [thumbnail]    ashift                 (   0/   0)  928x 619 scale=0.6897 --> (   0/   0)  900x 573 scale=0.6897 IOP_CS_RGB
     1.7469 pixelpipe process CL       [thumbnail]    flip                   (   0/   0)  900x 573 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7474 pixelpipe process CL       [thumbnail]    exposure               (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7479 pixelpipe process CL       [thumbnail]    colorin                (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB -> IOP_CS_LAB
     1.7493 transform colorspace CL    [thumbnail]    channelmixerrgb        (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_LAB -> IOP_CS_RGB
     1.7517 pixelpipe process CL       [thumbnail]    channelmixerrgb        (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7523 pixelpipe process CL       [thumbnail]    diffuse                (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7562 pixelpipe process CL       [thumbnail]    colorbalancergb        (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7583 pixelpipe process CL       [thumbnail]    filmicrgb              (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7609 transform colorspace CL    [thumbnail]    bilat                  (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB -> IOP_CS_LAB
     1.7629 pixelpipe process CL       [thumbnail]    bilat                  (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_LAB
     1.7685 transform colorspace CL    [thumbnail]    velvia                 (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_LAB -> IOP_CS_RGB
     1.7705 pixelpipe process CL       [thumbnail]    velvia                 (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7742 transform colorspace CPU   [thumbnail]    colorout               (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB -> IOP_CS_LAB
     1.7747 pixelpipe process CPU      [thumbnail]    colorout               (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_LAB -> IOP_CS_RGB
     1.7820 pixelpipe process CPU      [thumbnail]    gamma                  (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7851 cache report               [thumbnail]                           2 lines (important=0, used=0, invalid=0). Using 729MB, limit=0MB. Hits/run=0.00. Hits/test=0.000
     1.7851 pixelpipe finished         [thumbnail]                           (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897
  1. dt version
    4.5.0+569~g4fa7a254a

2a)
Linux Fedora release 40 (Rawhide)

Oops, i was asking for a log showing the mentioned problem :slight_smile:

Lunchtime procrastination…

Used the source RAW and *.xmp from here: https://math.dartmouth.edu/~sarunas/darktable_bench.html

Old Desktop, 6-7 year-old Intel NUC:

Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz w/ 32GB RAM
/usr/bin/flatpak run --branch=stable --arch=x86_64 --command=/app/bin/darktable-cli --file-forwarding org.darktable.Darktable setubal.orf setubal.orf.xmp test.jpg --core --disable-opencl -d perf
44.8865 [dev_process_export] pixel pipeline processing took 42.476 secs (249.832 CPU)
45.3970 [dev_process_export] pixel pipeline processing took 42.565 secs (250.119 CPU)
46.9705 [dev_process_export] pixel pipeline processing took 44.434 secs (254.009 CPU)

AVERAGES:

  • pixel pipeline processing = 43.158 sec
  • CPU took = 251.320 sec
  • Baseline value.

New Desktop, spec as in post above. AMD Ryzen 7 5700G only w/ opencl disabled.

AMD Ryzen 7 5700G with Radeon Graphics w/ 64GB RAM
/usr/bin/flatpak run --branch=stable --arch=x86_64 --command=/app/bin/darktable-cli --file-forwarding org.darktable.Darktable setubal.orf setubal.orf.xmp test.jpg --core --disable-opencl -d perf
12.9354 [dev_process_export] pixel pipeline processing took 12.101 secs (160.745 CPU)
12.9902 [dev_process_export] pixel pipeline processing took 12.156 secs (161.078 CPU)
13.0178 [dev_process_export] pixel pipeline processing took 12.181 secs (162.105 CPU)

AVERAGES:

  • pixel pipeline processing = 12.146 sec
  • CPU took = 161.309 sec
  • Speedup = 3.56 times faster !

New Desktop, spec as in post above. AMD Ryzen 7 5700G w/ NVIDIA GeForce RTX 3060 opencl enabled.

AMD Ryzen 7 5700G with Radeon Graphics w/ 64GB RAM
/usr/bin/flatpak run --branch=stable --arch=x86_64 --command=/app/bin/darktable-cli --file-forwarding org.darktable.Darktable setubal.orf setubal.orf.xmp test.jpg --core -d opencl -d perf
3.0812 [dev_process_export] pixel pipeline processing took 2.135 secs (3.174 CPU)
3.0812 [dev_process_export] pixel pipeline processing took 2.137 secs (3.241 CPU)
3.0896 [dev_process_export] pixel pipeline processing took 2.132 secs (3.212 CPU)

[dt_opencl_device_init]
   DEVICE:                   0: 'NVIDIA GeForce RTX 3060'
   PLATFORM NAME & VENDOR:   NVIDIA CUDA, NVIDIA Corporation
   CANONICAL NAME:           nvidiacudanvidiageforcertx3060
   DRIVER VERSION:           535.86.05
   DEVICE VERSION:           OpenCL 3.0 CUDA, SM_20 SUPPORT
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          12044 MB

AVERAGES:

  • pixel pipeline processing = 2.135 sec
  • CPU took = 3.209 sec
  • Speedup = 20.22 times faster !!!
2 Likes

This is nowhere near scientific…

My setup has the NVIDIA GeForce RTX 3060 as a compute card only - no video responsibilities. Therefore, in theory the full 12G VRAM is available to opencl / darktable if needed. Display responsibilities are solely the domain of the AMD APU/iGPU integrated within the AMD Ryzen 7 5700G with Radeon Graphics.

First screenshot is DT 4.4.2 bulk processing one of my Canon R6mk2 (24 megapixels) image directories. The green bumps is nvidia opencl usage. nvidia memory usage for these 24 megapixel images doesn’t seem to exceed 3G.

Therefore, if you are using a 4G video card for both display responsibilities & opencl compute - even if processing 24 megapixel images and image processing only uses 3G of VRAM - it may still fallback because of the VRAM required for display responsibilities.

Second screenshot is DT 4.4.2 bulk processing my local_copy directory full of PlayRAW images. Various megapixel images. Some of the PlayRAW images exceed 6G of VRAM used!

So from this non-scientific Play(RAW) - it would seem that 8G is the minimum for new Video cards as per @paperdigits recommendation.

That is not completely correct, I think, as darktable can apply tiling (breaking the images into smaller pieces, and processing each separately). It’s not as efficient as having enough RAM, of course.

You can play with the tuning parameters. If the example given there is still valid, with the default resource allocation, from your card’s 12 GB of VRAM, only (12 GB - 400 MB) * 700 / 1024 ~= 8 GB would ever be used. Of course, since you only hit 3G of peak usage, it’s unlikely that you came near this limit.

1 Like

If you’re shelling out money for a completely new rig, I’d agree that you should target at least 8GB of GPU.

The advice generally given in our chat room is “how much money do you have? The higher spec’d GPU you can get, the faster it’ll be and the longer it’ll last you as you move through camera bodies that will get more and more megapixels as time goes on.”

I’m too embarrassed to even post what my little laptop GPU did (or for that matter, CPU either). Geez… Good thing I don’t shoot too often! :grimacing:

The pixel processing times I posted above are only for 24MP Canon R6mk2 images.

hmm - got me thinking…

I found some RAW files from a 61MP Sony A7r IV review here: (Gallery of Sony A7R IV Sample Images (RAW & SOOC JPGs) - Mirrorless Comparison)

Some of these are 100MB RAW files - which I then played with doing what I’d normally do for a typical edit. Includes masks, vignettes, highlight recovery…

What I got as an average across 5 different large 61MP RAW files was:

AVERAGES:

  • pixel pipeline processing = 2.9316 sec
  • CPU took = 19.8548 sec

The pixel pipeline for these 61MP images is not significantly different (<1sec) to my original test for 24MP images. However, the CPU took a while longer to spin-up to speed. The max VRAM usage I saw for any of these 61MP images was only about 7GB.

 . [dev_process_export] pixel pipeline processing took 2.272 secs (16.462 CPU)
 . [export_job] exported to `/darktable_exported/DSC00703_01.jp2'
 
 . [dev_process_export] pixel pipeline processing took 3.218 secs (26.190 CPU)
 . [export_job] exported to `/darktable_exported/DSC01308_01.jp2'
 
 . [dev_process_export] pixel pipeline processing took 2.866 secs (21.270 CPU)
 . [export_job] exported to `/darktable_exported/DSC01364_01.jp2'
 
 . [dev_process_export] pixel pipeline processing took 4.236 secs (20.355 CPU)
 . [export_job] exported to `/darktable_exported/DSC01388_01.jp2'

 . [dev_process_export] pixel pipeline processing took 2.066 secs (14.997 CPU)
 . [export_job] exported to `/darktable_exported/DSC01466_01.jp2'

I suspect that as the current crop of 50MP+ cameras are significantly out of my price range. The computing setup I now have will be enough for the foreseeable future.