Does opencl improve speed of setting parametric masks?

Roger.Wilco · May 24, 2023, 12:31pm

Thanks everyone for their input.

So it seems that even though opencl may not directly affect or improve parametric mask setting speed. Because a parametric mask relies on the previous module processing stack - it will have an influence. Anecdotal evidence suggests that - that alone has a significant impact on usefulness.

From anecdotal evidence for 24Mpx raw images any video card with RAM greater than 6GB seems to be OK.

Again - thanks for everyone’s input - much appreciated.

priort · May 24, 2023, 5:03pm

Not all cards are created equally and beyond that maybe often overlooked is the OS support and driver… bad driver can really mess things up… so you can’t just go by the amount of ram on the card… also many cards have the same amount of Ram but much faster so there are lots of elements to consider for a card selection…

Roger.Wilco · May 24, 2023, 7:58pm

Warning acknowledged - so the question then becomes - how to understand these elements prior to purchase and testing?

Would something with the following be OK?:

AMD Ryzen 7 5700G (8C 16T) @3.8-4.6Ghz / 20MB Cache
AMD RADEON RX 6700 XT 12GB @2620MHz boost

kofa · May 24, 2023, 8:07pm

Nvidia is usually better supported, at least you find more problems reported on this forum from AMD Radeon users. But I could be wrong.

priort · May 24, 2023, 8:14pm

Likely fine if you are running Win OS but for Linux I can’t say… drivers for video cards and opencl seem a bit less straightforward but I also don’t have too much experience dealing with drivers in Linux so take it with a grain of salt…

Tim · May 24, 2023, 8:52pm

The number of processors on the graphics card is very important. For example, mine has 1,920 cores, and some have many more than that.

priort · May 25, 2023, 1:37am

And there are different types so all part of that lots of elements. I am not sure to what extent cuda cores vs tensor cores vs rt cores contribute to processing in DT.

Roger.Wilco · May 25, 2023, 6:17pm

Warning acknowledged - I’ll have to do some googling and try to make a decision.

At home I only run Linux - it does seem to be a corner case to use Linux & opencl for darktable/blender/Davinci Resolve etc… I’ve found some step-by-step instructions for opencl on Linux for blender or Davinci Resolve so I’m happy that with a opencl card that is now one-to-two years old there is probably a reasonable chance to get it working - but I appreciate that it probably won’t be plug-n-play.

It would appear the only way is to test-it-&-see… unfortunately.

Roger.Wilco · September 11, 2023, 5:26pm

To answer my original inquiry:

Does opencl improve speed of setting parametric masks?

Well - I can categorically now say after test-it-&-see “YES” - having a working opencl capable card does significantly improve the performance of creating and modifying parametric masks!

To complete the discussion - ended-up with the following:

Processor (CPU)	AMD Ryzen 7 5700G Eight Core CPU with Radeon™ Graphics (3.8GHz-4.6GHz/20MB CACHE/AM4)
Motherboard	GIGABYTE B550I AORUS PRO AX: DDR4, USB 3.2 - ARGB Ready
Memory (RAM)	64GB PCS PRO DDR4 3200MHz (2 x 32GB)
Graphics Card	12GB NVIDIA GEFORCE RTX 3060 - HDMI, DP, LHR
1st M.2 SSD Drive	2TB SOLIDIGM P41+ GEN 4 M.2 NVMe PCIe SSD (up to 4125MB/sR, 3325MB/sW)
2nd M.2 SSD Drive	2TB SOLIDIGM P41+ GEN 4 M.2 NVMe PCIe SSD (up to 4125MB/sR, 3325MB/sW)

Did a preliminary Linux + Darktable install and had opencl working (almost) out of the box. Seems to work well enough for standard editing with default values. Though, have found that if I copy *.xmp history to a hundred or so images in lighttable. DT will complain about “inconsistent data” and error with something like: “disabling opencl for this session”. if encountered again - will need to dig further.

I realize that this is not a new revelation to many… but after working with RT for a while and DT for this year - this is the first time I’ve been able to actually see the change in the image in sync with the sliders in real-time! I had been editing by numbers previously as sliders were in many cases too painful. Wow - this changes everything!

lphilpot · September 11, 2023, 6:14pm

Looks like a decent bump. I wish I could use my “mini-GPU” for what little it’s worth. I’ve got an AMD Radeon adapter on my Ryzen 7 5700U 16 GB Windows 11 laptop.¹ It has a cheesy little integrated GPU with 0.5 GB dedicated RAM (plus system RAM).

I primarily use ART so it’s not a factor there. However, Affinity Photo supports OpenCL so I’d like to be able to use it for whatever minor bump it would provide but I’ve seen almost identical fatal crashes while in both darktable and AP with OpenCL enabled. Both were as I made back-and-forth adjustments, e.g., pushing a slider repeatedly back and forth while looking at various part of the image to watch the effect. The screen went blank and then the both displays were covered in a small herringbone / checkerboard type pattern with zero response to any input. Only way out was a button push.

Disabled OpenCL and it hasn’t done it again.

I’m not at all knowledgeable in terms of GPU drivers, etc.² but I ran a utility from AMD which upgraded the chipset drivers, among others. After that I experienced frequent USB disconnections with my dock. Removed the AMD drivers and that stopped.

Now that I’ve canned that goofy WavLink dock, I might try again I guess. It can be confusing trying to extract “photo-useful” tidbits from all the gaming information, since GPUs are so game-centric.

Acer laptop. Yeah, I know… but to be fair I’m not a gamer so it’s more than plenty for everything except image processing and it’s actually been fine otherwise.
I spent two+ decades in IT but desktop (particularly gaming) hardware was never an area of interest for me.

Roger.Wilco · September 11, 2023, 7:18pm

I’m not a gamer (except for a brief moment with Doom while at Uni). I have a small 3rd bedroom - which when I moved into the house I repurposed as a computer room/study/hobby room. It is small and with a reasonable desk, chair, bookcase - there really isn’t a lot more room. So I’ve been happy with a small Intel i7 (4 Core) NUC running Linux stuck under the desk for the last six years. Zero extra space used - everything compact.

It does everything well enough. Its old enough to be fully supported in Linux. Fast enough for financial analysis with R, and more than enough for spreadsheets and documents, casual web browsing & watching videos from youtube/vimo. It was also fast enough for RT for my old camera. I was adamant that I was not going to spend money until the old computer died completely.

I upgraded the camera last Christmas and that encouraged me to update my image processing ability. The desktop was good enough for general editing - but the thing that pushed me over the edge - was DT’s parametric masks. Waiting for a minute to see if the setting had selected the targeted part of the image only to have to try again - was painful.

Exactly - I’ve been around Telecoms (my mother would tell her friends “its something to do with computers”) & IT since school. I’ve built my own computers and servers etc… but the effort required to figure-out if things were supported (by which toolkit) or compatible with, available in-stock, not end-of-life etc… was more than the effort and time that I had available. I’d rather be doing other things. So I found and spoke to a custom PC builder company - and went from there. I could have got slightly better quality/features by sourcing myself and putting together - but this way - I got something that they put their reputation on that should work.

The thinking about an AMD APU with integrated graphics as well as nvidia was that AMD graphics are better supported under Linux for displaying stuff whereas nvidia has much better support for CUDA (opencl) for calculating stuff. Time will tell if it was the right decision.

hannoschwalm · September 11, 2023, 7:47pm

Interested in this a lot.

Would appreciate a log with -d pipe -d opencl
What dt version are you using?
Did you do anything on opencl settings? scheduling? headroom?

lphilpot · September 11, 2023, 8:21pm

I guess I’ll re-run the AMD “detect and install everything appropriate” tool once more. Thing is, it assumes I’m gaming. So I have to go through all the options it wants to enable (color, etc.), try to figure out what does / doesn’t make sense for color-sensitive photo work, then disable / enable only what should be.

Oh well, at least it’s time to re-calibrate my displays so maybe I can just start over.

g-man · September 11, 2023, 8:46pm

And what Linux and what nvidia drivers…

Roger.Wilco · September 11, 2023, 9:09pm

Will endeavor to find a repeatable instance where opencl drops-out and will then provide the logs and other info. I’ll need to redo what I was doing.

(Unrelated) I’ve found another (different) scenario where opencl drops-out with some export parameter options for some of the file formats.

Would it be better to create a github issues and put the detail there?

01McAc · September 12, 2023, 3:57pm

Here you go. How do you like the log? In here

KERNEL BUILD DIRECTORY:   /home/kirk/programs/share/darktable/kernels
   KERNEL DIRECTORY:         /home/kirk/.cache/darktable/cached_v2_kernels_for_NVIDIACUDANVIDIAGeForceRTX3060Ti_53510405
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   KERNEL LOADING TIME:       0.0197 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init]           0       'NVIDIA CUDA NVIDIA GeForce RTX 3060 Ti'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           1       1       1       1       1
[opencl_synchronization_timeout] synchronization timeout set to 0
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           1       1       1       1       1
[opencl_synchronization_timeout] synchronization timeout set to 0
     1.7375 pixelpipe starting CL      [thumbnail]                           (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 device=0 (nvidiacudanvidiageforcertx3060ti)
     1.7376 [dt_opencl_check_tuning] use 7573MB (headroom=ON, pinning=OFF) on device `NVIDIA CUDA NVIDIA GeForce RTX 3060 Ti' id=0
     1.7376 modify roi IN              [thumbnail]    flip                   (   0/   0)  900x 573 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 
     1.7376 modify roi IN              [thumbnail]    ashift                 (   0/   0)  928x 619 scale=0.6897 --> (   0/   0)  900x 573 scale=0.6897 
     1.7376 modify roi IN              [thumbnail]    demosaic               (   0/   0) 1345x 897 scale=1.0000 --> (   0/   0)  928x 619 scale=0.6897 
     1.7376 modify roi IN              [thumbnail]    highlights             (   0/   0) 1347x 898 scale=1.0000 --> (   0/   0) 1345x 897 scale=1.0000 
     1.7376 modify roi IN              [thumbnail]    rawprepare             (   0/   0) 1349x 900 scale=1.0000 --> (   0/   0) 1347x 898 scale=1.0000 
     1.7377 pixelpipe data: full       [thumbnail]                           (   0/   0) 1349x 900 scale=1.0000 --> (   0/   0) 1349x 900 scale=1.0000 
     1.7384 pixelpipe process CL       [thumbnail]    rawprepare             (   0/   0) 1349x 900 scale=1.0000 --> (   0/   0) 1347x 898 scale=1.0000 IOP_CS_RAW
     1.7391 pixelpipe process CL       [thumbnail]    temperature            (   0/   0) 1347x 898 scale=1.0000 --> (   0/   0) 1347x 898 scale=1.0000 IOP_CS_RAW
     1.7397 pixelpipe process CL       [thumbnail]    highlights             (   0/   0) 1347x 898 scale=1.0000 --> (   0/   0) 1345x 897 scale=1.0000 IOP_CS_RAW
     1.7412 pixelpipe process CL       [thumbnail]    demosaic               (   0/   0) 1345x 897 scale=1.0000 --> (   0/   0)  928x 619 scale=0.6897 IOP_CS_RAW -> IOP_CS_RGB
     1.7444 clip_and_zoom_roi_cl       [thumbnail]    demosaic               (   0/   0) 1345x 897 scale=1.0000 --> (   0/   0)  928x 619 scale=0.6897 
     1.7462 pixelpipe process CL       [thumbnail]    ashift                 (   0/   0)  928x 619 scale=0.6897 --> (   0/   0)  900x 573 scale=0.6897 IOP_CS_RGB
     1.7469 pixelpipe process CL       [thumbnail]    flip                   (   0/   0)  900x 573 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7474 pixelpipe process CL       [thumbnail]    exposure               (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7479 pixelpipe process CL       [thumbnail]    colorin                (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB -> IOP_CS_LAB
     1.7493 transform colorspace CL    [thumbnail]    channelmixerrgb        (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_LAB -> IOP_CS_RGB
     1.7517 pixelpipe process CL       [thumbnail]    channelmixerrgb        (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7523 pixelpipe process CL       [thumbnail]    diffuse                (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7562 pixelpipe process CL       [thumbnail]    colorbalancergb        (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7583 pixelpipe process CL       [thumbnail]    filmicrgb              (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7609 transform colorspace CL    [thumbnail]    bilat                  (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB -> IOP_CS_LAB
     1.7629 pixelpipe process CL       [thumbnail]    bilat                  (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_LAB
     1.7685 transform colorspace CL    [thumbnail]    velvia                 (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_LAB -> IOP_CS_RGB
     1.7705 pixelpipe process CL       [thumbnail]    velvia                 (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7742 transform colorspace CPU   [thumbnail]    colorout               (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB -> IOP_CS_LAB
     1.7747 pixelpipe process CPU      [thumbnail]    colorout               (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_LAB -> IOP_CS_RGB
     1.7820 pixelpipe process CPU      [thumbnail]    gamma                  (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897 IOP_CS_RGB
     1.7851 cache report               [thumbnail]                           2 lines (important=0, used=0, invalid=0). Using 729MB, limit=0MB. Hits/run=0.00. Hits/test=0.000
     1.7851 pixelpipe finished         [thumbnail]                           (   0/   0)  573x 900 scale=0.6897 --> (   0/   0)  573x 900 scale=0.6897

dt version
4.5.0+569~g4fa7a254a

2a)
Linux Fedora release 40 (Rawhide)

hannoschwalm · September 12, 2023, 8:02pm

Oops, i was asking for a log showing the mentioned problem

Roger.Wilco · September 13, 2023, 12:10pm

Lunchtime procrastination…

Used the source RAW and *.xmp from here: https://math.dartmouth.edu/~sarunas/darktable_bench.html

Old Desktop, 6-7 year-old Intel NUC:

Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz w/ 32GB RAM
/usr/bin/flatpak run --branch=stable --arch=x86_64 --command=/app/bin/darktable-cli --file-forwarding org.darktable.Darktable setubal.orf setubal.orf.xmp test.jpg --core --disable-opencl -d perf
44.8865 [dev_process_export] pixel pipeline processing took 42.476 secs (249.832 CPU)
45.3970 [dev_process_export] pixel pipeline processing took 42.565 secs (250.119 CPU)
46.9705 [dev_process_export] pixel pipeline processing took 44.434 secs (254.009 CPU)

AVERAGES:

pixel pipeline processing = 43.158 sec
CPU took = 251.320 sec
Baseline value.

New Desktop, spec as in post above. AMD Ryzen 7 5700G only w/ opencl disabled.

AMD Ryzen 7 5700G with Radeon Graphics w/ 64GB RAM
/usr/bin/flatpak run --branch=stable --arch=x86_64 --command=/app/bin/darktable-cli --file-forwarding org.darktable.Darktable setubal.orf setubal.orf.xmp test.jpg --core --disable-opencl -d perf
12.9354 [dev_process_export] pixel pipeline processing took 12.101 secs (160.745 CPU)
12.9902 [dev_process_export] pixel pipeline processing took 12.156 secs (161.078 CPU)
13.0178 [dev_process_export] pixel pipeline processing took 12.181 secs (162.105 CPU)

AVERAGES:

pixel pipeline processing = 12.146 sec
CPU took = 161.309 sec
Speedup = 3.56 times faster !

New Desktop, spec as in post above. AMD Ryzen 7 5700G w/ NVIDIA GeForce RTX 3060 opencl enabled.

AMD Ryzen 7 5700G with Radeon Graphics w/ 64GB RAM
/usr/bin/flatpak run --branch=stable --arch=x86_64 --command=/app/bin/darktable-cli --file-forwarding org.darktable.Darktable setubal.orf setubal.orf.xmp test.jpg --core -d opencl -d perf
3.0812 [dev_process_export] pixel pipeline processing took 2.135 secs (3.174 CPU)
3.0812 [dev_process_export] pixel pipeline processing took 2.137 secs (3.241 CPU)
3.0896 [dev_process_export] pixel pipeline processing took 2.132 secs (3.212 CPU)

[dt_opencl_device_init]
   DEVICE:                   0: 'NVIDIA GeForce RTX 3060'
   PLATFORM NAME & VENDOR:   NVIDIA CUDA, NVIDIA Corporation
   CANONICAL NAME:           nvidiacudanvidiageforcertx3060
   DRIVER VERSION:           535.86.05
   DEVICE VERSION:           OpenCL 3.0 CUDA, SM_20 SUPPORT
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          12044 MB

AVERAGES:

pixel pipeline processing = 2.135 sec
CPU took = 3.209 sec
Speedup = 20.22 times faster !!!

Roger.Wilco · September 13, 2023, 4:06pm

This is nowhere near scientific…

My setup has the NVIDIA GeForce RTX 3060 as a compute card only - no video responsibilities. Therefore, in theory the full 12G VRAM is available to opencl / darktable if needed. Display responsibilities are solely the domain of the AMD APU/iGPU integrated within the AMD Ryzen 7 5700G with Radeon Graphics.

First screenshot is DT 4.4.2 bulk processing one of my Canon R6mk2 (24 megapixels) image directories. The green bumps is nvidia opencl usage. nvidia memory usage for these 24 megapixel images doesn’t seem to exceed 3G.

Therefore, if you are using a 4G video card for both display responsibilities & opencl compute - even if processing 24 megapixel images and image processing only uses 3G of VRAM - it may still fallback because of the VRAM required for display responsibilities.

Second screenshot is DT 4.4.2 bulk processing my local_copy directory full of PlayRAW images. Various megapixel images. Some of the PlayRAW images exceed 6G of VRAM used!

So from this non-scientific Play(RAW) - it would seem that 8G is the minimum for new Video cards as per @paperdigits recommendation.

kofa · September 13, 2023, 4:33pm

That is not completely correct, I think, as darktable can apply tiling (breaking the images into smaller pieces, and processing each separately). It’s not as efficient as having enough RAM, of course.

You can play with the tuning parameters. If the example given there is still valid, with the default resource allocation, from your card’s 12 GB of VRAM, only (12 GB - 400 MB) * 700 / 1024 ~= 8 GB would ever be used. Of course, since you only hit 3G of peak usage, it’s unlikely that you came near this limit.