GPU acceleration for AI features in Darktable - help needed testing install scripts

How much ram does your 3050 have…ie vram…on my system CPU is faster than doing it on my 3060TI which only has 8Gb of vram… maybe I still have something not set quite right… before I added the CUDA ORT I feel like the module was actually faster… again I have to do everything in a systematic way and log it to confirm that …

1 Like

Which model are you talking about? 8Gb of VRAM is more than enough for all current AI tasks.

The only challenging model right now is raw denoise for Bayer sensors. For some execution providers it slows down processing on GPU significantly. Work on this is in progress.

1 Like

I’ll try to run and log all methods and see…I felt like it worked much quicker…before I installed the cuda runtime vs just using the default install… I will confirm…

EDIT: see attached files…
Logs for a test run of raw denoise and denoise run one after the other.
3 condtions
selecting CPU using the cuda ORT
selecting GPU using the cuda ORT
selecting DirectML using the default ORT provided by DT

DIrectML was almost instantaneous to begin and was very smooth and fast… CUDA I believe was installed as required but was much slower…
darktable-log_cpu-CUDAORT.txt (8.0 KB)
darktable-log_DTdefault-DirectML.txt (7.2 KB)
darktable-log_gpu-CUDAORT.txt (6.4 KB)
times.txt (227 Bytes)

OK, so I did that. And I still have the same in the output. NB, your script says cuda 13 required.

I had a fight with the horrible ubuntu “broken package” error along the way. I restored to a Timeshift image with NO cudaa installed and took everything from there.

This is how far I get 20 minutes into a raw denoise…

167.3427 [restore_raw_bayer] 7168x5120 sensor (CFA origin 0,0), working 3584x2560 packed, tile T=2048, 2x2 grid (4 tiles)
   167.3486 [restore_raw_bayer] raw CFA range [0.0, 16383.0], black=[512,512,512,512] white=15360 wb_coeffs=[2645.000,1024.000,1731.000,0.000] wb_norm=[1.000,1.000,1.000]
   167.5078 [restore_raw_bayer] tile0 model_input range R=[-0.034,0.941] G1=[-0.034,1.069] G2=[-0.034,1.069] B=[-0.034,1.069]
   551.6473 wait time 0.504564s
   554.4355 try- wait time 1.347845s
   554.4355 wait time 2.277599s
   556.6894 wait time 1.494845s

After five minutes, nothing moved. Progress bar is there on dt, but no progress indicated.

dt now darktable 5.5.0+1165~ged6e487df5

NB… if mine is a weirdo, outlying case, then I can accept to live with it!

8 or 12Gb: I have forgotten. I’ll try to find out.

… only 8Gb

Please, don’t test GPU acceleration on raw denoise model. It has some known problems on GPUs. Try “denoise” task.

1 Like

CUDA EP on Windows have problems, which I don’t yet analyse well. I can confirm that I also experience very slow processing on Windows 11 if using CUDA EP. But DirectML on NVIDIA GPU works well and fast. That is a default bundled setup, and I recommend it for now.

1 Like

What this command returns?

nvcc --version

https://docs.nvidia.com/deeplearning/cudnn/backend/v9.21.1/reference/support-matrix.html

Pairings and requirements are listed here…seems like your driver is new enough but likely the newest one might work best to get all the Nvidia stuff sync’d and performing the best.

THe link has this note:

But this might not be the case for the current module at this time in DT …maybe v12 is better??

On my system I tried v12 with 9.2.1 cuDNN… it was not nearly as fast on my hardware as the DirectML supplied with a default install… so I may revisit the cuda drivers but for now I don’t see how they could be much faster than what I am getting …

1 Like

Thanks for taking the time to comment and cofirm what I saw. I appreciate your feedback and all the time that you are giving to the project. I don’t know the core differences between DirectML and CUDA on NVIDIA card, but from some quick internet searches most times CUDA should be equal or more often somewhat better but it seems like it can be hardware specific, and really the way it is working with DirectML on my hardware would be hard to improve on. This could be the thing that with so many driver and hardware combinations perfomance might vary greatly?

One thing I noticed (as I found the cuDNN path for the CUDA-ORT was sticky in the darktablerc file and I had to manually delete it to get back to defaults and for DT to use DirectML) was the difference in the settings for cache/tiling… This might be required and necessary but DirectML was 1024 and CUDA was 2048… I think you mentioned something about this setting in one comment above but I don’t recall what you said… I don’t know if you can tweak those settings for CUDA or maybe that value is needed as a default or minimum to work…

For now I will happily leave it on the default setup that you provided :slight_smile:

If you are talking about ONNX Runtime library path, you can just delete it in AI preference tab or double-click on label to revert to default.

Yes, this could be. It is tile size is defined at model start and cached in config. You should not change it manually.

Thanks, ya I was just thinking out loud about the tile size…I didn’t change anything…

As for the path…it came back each time I restarted DT. I manually deleted it and I had removed all the Nvidia stuff from my system…

I seemed to have to go in and manually delete that txt from my darktablerc file…

I do run it from a different config directory using --configdir but I wouldn’t think that would cause an issue??

If I circle back and try the NVIDIA Cuda stuff again I will try to reset it as you suggest …

In any event…its all working fine thanks again for integrating this into DT…

Are you editing darktablerc manually? Editing while DT is open? DT saves configs on exit. That could be the reason.

But really you don’t need to edit darktablerc to change GPU acceleration. Everything can be done in AI preferences.

$ nvcc --version
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
$ sudo apt install nvidia-cuda-toolkit
... ... ... ...

Well, that installed a heap of stuff! Ok, here you go…

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

I started with 13. Andrii said to install 12.

You have CUDA 12 installed. Script should detect it and install correct version of ONNX Runtime.

Ya I only edited it when it was closed …and it was in my case the only way the setting would go away but as I said all good now… thanks for your comments…

I knew that… but it seems that dt doesn’t. It still, in the output, says 13, update your nvidia driver.

Just to see what happened, I ran the system update program. Since installing the toolkit, it set out to update/install a heap of nvidia stuff, and failed to install nvidia drivers. Rebooting the system took me into strange-error land.

It’s been interesting trying to sort this out, and thank you for your help and suggestions. However I’ve reverted to my no-CUDA Timeshift image, and having spent several hours going round in circles, I’m giving up on this for now. Sorry not to be able to contribute something useful

I own a AMD Radeon RX 6700 XT (gfx1031). As I’ve had problems installing ROCM in the past, I’m currently using Mesa’s RustiCL. Is there also an implementaion possible on this basis? (alternatively, can someone point me to a working rocm installation for this card?)

Thanks!

What Topaz is doing, they take the part of image that is visible in view-port (actually a bit bigger) and that is what they use as preview for the denoise so you can compare 1-1 on your screen. Do you zoom in or zoom out then the preview is recomputed.

Officially, this GPU is not supported by ROCm. See AMD docs here.

You can still try to install ROCm and there is a pretty good chance it will work, but expect very long first run for each model - about 30-40 minutes. The second and the following runs will be just seconds if model graph compilation was cached successfully.