Help me understand GPU use in dartktable

Ah yes, but highlight reconstruction is disabled by default.

My best guess of the issue is that processing one of the images, there is a fault/error in a module and the system stops using the GPU. Until we can see the fault in a log, it will be hard to troubleshoot.

I know there were some recent fixes for ROI on current master. An option for you is to try current master.

1 Like

I think I may have found something. I ran darktable from a terminal window last night and forgot to stop it. When I looked at again hours later, the terminal is filled with hundreds of these messages:

29780.5142 [dt_imageio_large_thumbnail] error: The thumbnail image is not in JPEG format, and DT was built without neither GraphicsMagick or ImageMagick. Please rebuild DT with GraphicsMagick or ImageMagick support enabled.
29780.5143 [dt_imageio_large_thumbnail] error: Not a supported thumbnail image format or broken thumbnail: image/tiff
2 Likes

Ya maybe not but I think the handles one is… especially when you see tiling…I am sure this was a suggestion made in one of the original rounds of discussion on optimizing… I agree micronap might be an issue if set that way and that is why I said I get away with it and it makes a measurable performance impacts on my system and I have no crashes ever since changing that one… as for the async suggestion it was just something to try… all are easy to reverse…

Seems like maybe an alternate issue might be the problem…

@Dave_G where did you get this version of darktable from that it doesn’t have graphics or imagemagick?

I ran in a similar issue lately and found out that one of my diffuse-or-sharpen preset was on auto-apply and was causing culling mode to be awfully slow even though I was mainly running on GPU: 27 seconds for 2 Sony A7RV images (60MP each). Without DS module, the two previews are processed in 9 seconds.

Since nothing I tried helped, I decided to start with a clean slate:

I now had a clean system with only a few settings changed (OpenCL scheduling profile=very fast GPU, auto-apply pixel workflow changed from filmic to sigmoid, deactivated all except nVidia OpenCL drivers). I loaded the same batch of files, and…

No difference. It’s still using the CPU and not the GPU, which I can see with real-time monitoring. Running preview zoom 100% in culling mode is a little quicker now, about 30 seconds instead of 45-60, which I attribute to the update and reboot, but it’s still not hitting the GPU.

At this point I’m completely baffled and not sure what to try next.

We likely need a -d common that has the transition from GPU to CPU path OR try master.

Did you try this? It helped me in the past and it is not limited to Windows environments.

I also changed this:

opencl_device_priority=+0/+0/+0/+0/+0
opencl_disable_drivers_blacklist=TRUE
opencl_mandatory_timeout=20000

I would not recommend these settings. His systems starts using GPU and then it stops using it. Most likely there is a fault going on.

Here’s the latest output from -d common with the latest install. I loaded a folder with 86 images, and thumbnail generation in lighttable used the CPU. I then opened culling mode and did preview zoom 100%, which again used the CPU.

This is with version 4.6.0. I had tried using the master, but that would not install using the dnf repo so to go that way I would need to use the Github version.
darktable.dcommon2.txt (928.3 KB)

It was using GPU until 86.6767 seconds. There was a timeout error and then it switched everything to CPU.

I was just reading and saw 2 entries for fail to lockout GPU so it times out… could this be a driver issue?? @Dave_G is your video card driver the latest one??

I was using a 5-month old driver (NVIDIA-Linux-x86_64-535.104.05). I just tried the latest version (NVIDIA-Linux-x86_64-535.154.05), but there’s no difference. I still see

71.8849 [opencl_lock_device] reached opencl_mandatory_timeout trying to lock mandatory device, fallback to CPU

in the logs from -d common (attached). I tried adjusting some of the nVidia settings in the control app, but those had no effect either.
darktable.dcommon3.txt (937.9 KB)

Progress. The timeout message suggested to me that maybe bumping opencl_mandatory_timeout from 400 to 1024 wasn’t enough. So as a test I increased it 10x to 4000. Now I’m no longer getting timeout errors, and zooming in culling mode is down to about 10 seconds, which I can live with. Running with -d common looks like it’s not using the CPU, though when I load a big batch of images into lighttable, like 500+, I can see the cpu meter hit 100%, and loading thumbnails takes a long time. But once they’re loaded I think I’m in a place where I can actually work.
darktable.dcommon5.txt (4.0 MB)

2 Likes

I know I suggested bumping the timeout to avoid the fault and in the last file it looks like the GPU remains working. It it still not solving the root cause (why it faults and why it stays faulted).

FYI, the current drivers for nvidia in rpm fusion are:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+

You mentioned adjusting some nvidia settings. I also noticed you have 3 other platform drivers installed. I wonder if your file setup is causing issues.

     0.1591 [check platform] platform 'Portable Computing Language' with key 'clplatform_portablecomputinglanguage' is NOT active
     0.1592 [check platform] platform 'Clover' with key 'clplatform_clover' is NOT active
     0.1592 [check platform] platform 'rusticl' with key 'clplatform_rusticl' is NOT active

The CPU will get used by the system to complete other tasks like writing the XMP, saving the thumbnail to the disk, some modules are only CPU path (Tone Equalizer).

Lastly, I use these settings to have the system generate the thumbnails in the background up to the 2k monitor I use. This can speed up your culling (by using the background generation).

image

1 Like

I’m about ready to give up, but I don’t know where to from here. I was using the v. 535 driver downloaded from nVidia, so I installed the latest 545.29.06 driver from rpmfusion. I’ve played around with the driver settings, opting for max performance over quality, but still no change. Nothing I’ve done so far has made any difference. Today I tried to edit a batch of 819 files and it was just painful to the point of impossible to work. It’s still just using the CPU and the GPU sits there untouched.

Maybe I will try to install the current master.
darktable.dcommon6.txt (562.7 KB)
darktablerc.txt (28.3 KB)

Try this line…

cldevice_v5_nvidiacudanvidiageforcegtx1660super=0 250 0 16 16 128 0 0 0.000 0.000 0.250

Where it says 128 , try 1024. Its actually a possible suggestion right from the manual as 128 is called very conservative… worse thing is that it wont help…

I see in your config you set the scheduling profile to ‘multiple GPUs’. Try ‘very fast GPU’ or ‘default’.
For what it’s worth, with a single NVidia 1060, these are my settings (note that I use a very large value for mandatory_timeout):

opencl=TRUE
opencl_device_priority=+0/*/+0/*/*
opencl_disable_drivers_blacklist=false
opencl_library=
opencl_mandatory_timeout=20000
opencl_scheduling_profile=very fast GPU
opencl_synch_cache=active module
opencl_tune_headroom=FALSE
opencl_tuning_mode=nothing
opencl_use_cpu_devices=false

My resource settings:

resource_default=512 32 128 700
resource_large=700 64 128 900
resource_small=128 16 64 400
resource_unrestricted=16384 1024 128 1024
resourcelevel=large

The clplatform settings (your card name will be different):

cldevice_v5_nvidiacudanvidiageforcegtx10606gb=0 250 0 64 64 1024 1 0 0.000 0.000 0.250
cldevice_v5_nvidiacudanvidiageforcegtx10606gb_building=-cl-fast-relaxed-math
cldevice_v5_nvidiacudanvidiageforcegtx10606gb_id0=600
clplatform_amdacceleratedparallelprocessing=FALSE
clplatform_apple=FALSE
clplatform_intelropenclhdgraphics=FALSE
clplatform_nvidiacuda=TRUE
clplatform_openclon12=FALSE
clplatform_other=FALSE
clplatform_rusticl=FALSE

You can force compiled ‘kernels’ (processing code) to be regenerated by deleting them:

rm -rf ~/.cache/darktable/*kernel*

Could you try running with -d opencl -d verbose -d perf instead of -d common? (edit: I now see you have tried that already, still, I think it’s worth having it on at least while you’re troubleshooting; you can combine it with -d common and with -d verbose, too). Certain warnings are only shown if opencl logging is on (for example reaching the mandatory_timeout and falling back to CPU, or ‘memory allocation failed’). Maybe it’s a bit less verbose, and you may get more OpenCL logs. I get output like this (notice lines like [preview] processed 'colorout' on CPU, blended on CPU and processed 'channelmixerrgb' on GPU, blended on GPU; the words in square brackets, like [full] and [preview] are the names of the pipelines):

[opencl_init] opencl_scheduling_profile: 'very fast GPU'
[opencl_init] opencl_device_priority: '+0/*/+0/*/*'
[opencl_init] opencl_mandatory_timeout: 20000
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           1       1       1       1       1
[opencl_synchronization_timeout] synchronization timeout set to 0
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           1       1       1       1       1
[opencl_synchronization_timeout] synchronization timeout set to 0
     1.1096 [opencl_update_settings] scheduling profile set to very fast GPU
    23.2498 [dt_dev_load_raw] loading the image. took 0.171 secs (0.139 CPU)
    23.7243 [histogram] took 0.001 secs (0.001 CPU) scope draw
    23.7671 [dt_dev_process_image_job] loading image. took 0.000 secs (0.000 CPU)
    23.7846 [dt_opencl_check_tuning] use 4808MB (headroom=OFF, pinning=OFF) on device `NVIDIA CUDA NVIDIA GeForce GTX 1060 6GB' id=0
    23.7849 [dev_pixelpipe] took 0.000 secs (0.000 CPU) initing base buffer [full]
    23.7922 [dev_pixelpipe] took 0.007 secs (0.007 CPU) [full] processed `rawprepare' on GPU, blended on GPU
    23.7936 [dev_pixelpipe] took 0.001 secs (0.002 CPU) [full] processed `temperature' on GPU, blended on GPU
    23.8003 [dev_pixelpipe] took 0.007 secs (0.005 CPU) [full] processed `highlights' on GPU, blended on GPU
    23.8294 [resample_cl] plan 0.000 secs (0.000 CPU) resample 0.001 secs (0.001 CPU)
    23.8302 [histogram] took 0.000 secs (0.000 CPU) scope draw
    23.8461 [dev_pixelpipe] took 0.046 secs (0.044 CPU) [full] processed `demosaic' on GPU, blended on GPU
    23.8989 [dev_pixelpipe] took 0.053 secs (0.101 CPU) [full] processed `lens' on GPU, blended on GPU
    23.8992 [histogram] took 0.004 secs (0.001 CPU) scope draw
    23.9019 [dev_pixelpipe] took 0.003 secs (0.005 CPU) [full] processed `exposure' on GPU, blended on GPU
    23.9059 [dev_pixelpipe] took 0.004 secs (0.003 CPU) [full] processed `colorin' on GPU, blended on GPU
    23.9092 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.003 secs (0.000 GPU) [channelmixerrgb]
    23.9139 [dev_pixelpipe] took 0.008 secs (0.002 CPU) [full] processed `channelmixerrgb' on GPU, blended on GPU
    23.9197 [dev_pixelpipe] took 0.006 secs (0.007 CPU) [full] processed `filmicrgb' on GPU, blended on GPU
    23.9602 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0.005 secs (0.053 CPU) [colorout]
    24.0420 [dev_pixelpipe] took 0.122 secs (0.852 CPU) [full] processed `colorout' on CPU, blended on CPU
    24.0495 [dev_pixelpipe] took 0.007 secs (0.072 CPU) [full] processed `gamma' on CPU, blended on CPU
    24.0495 [opencl_profiling] profiling device 0 ('NVIDIA CUDA NVIDIA GeForce GTX 1060 6GB'):
    24.0495 [opencl_profiling] spent  0.0026 seconds in [Write Image (from host to device)]
    ....
    24.0496 [opencl_profiling] spent  0.0015 seconds in filmicrgb_chroma
    24.0496 [opencl_profiling] spent  0.0350 seconds in [Read Image (from device to host)]
    24.0496 [opencl_profiling] spent  0.1068 seconds totally in command queue (with 0 events missing)
    24.0562 [dev_process_image] pixel pipeline took 0.272 secs (1.102 CPU) processing `2024-02-04_17-22-10_P1030065.RW2'
    24.0587 [histogram] took 0.000 secs (0.000 CPU) scope draw
    24.0670 [dt_dev_process_image_job] loading image. took 0.000 secs (0.000 CPU)
    24.0670 [dt_dev_process_image_job] loading image. took 0.000 secs (0.001 CPU)
    24.0782 [dev_process_image] pixel pipeline took 0.011 secs (0.045 CPU) processing `2024-02-04_17-22-10_P1030065.RW2'
    24.0900 [histogram] took 0.000 secs (0.000 CPU) scope draw
    24.0932 [dt_dev_process_image_job] loading image. took 0.000 secs (0.000 CPU)
    24.1138 [dev_pixelpipe] took 0.000 secs (0.000 CPU) initing base buffer [preview]
    24.1153 [dev_pixelpipe] took 0.001 secs (0.001 CPU) [preview] processed `rawprepare' on GPU, blended on GPU
    24.1160 [dev_pixelpipe] took 0.001 secs (0.000 CPU) [preview] processed `temperature' on GPU, blended on GPU
    24.1168 [dev_pixelpipe] took 0.001 secs (0.000 CPU) [preview] processed `highlights' on GPU, blended on GPU
    24.1212 [dev_pixelpipe] took 0.004 secs (0.000 CPU) [preview] processed `demosaic' on GPU, blended on GPU
    24.1330 [dev_pixelpipe] took 0.012 secs (0.000 CPU) [preview] processed `lens' on GPU, blended on GPU
    24.1338 [dev_pixelpipe] took 0.001 secs (0.000 CPU) [preview] processed `exposure' on GPU, blended on GPU
    24.1355 [dev_pixelpipe] took 0.002 secs (0.000 CPU) [preview] processed `colorin' on GPU, blended on GPU
    24.1373 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.001 secs (0.000 GPU) [channelmixerrgb]
    24.1390 [dev_pixelpipe] took 0.003 secs (0.000 CPU) [preview] processed `channelmixerrgb' on GPU, blended on GPU
    24.1432 [dev_pixelpipe] took 0.004 secs (0.007 CPU) [preview] processed `filmicrgb' on GPU, blended on GPU
    24.1559 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0.001 secs (0.010 CPU) [colorout]
    24.1781 [dev_pixelpipe] took 0.035 secs (0.237 CPU) [preview] processed `colorout' on CPU, blended on CPU
    24.1808 [dev_pixelpipe] took 0.003 secs (0.019 CPU) [preview] processed `gamma' on CPU, blended on CPU
    24.2185 [dt_ioppr_transform_image_colorspace_rgb] RGB-->RGB took 0.038 secs (0.386 lcms2) [final histogram]
    24.2722 [histogram] took 0.091 secs (0.452 CPU) final RGB parade
    24.2722 [opencl_profiling] profiling device 0 ('NVIDIA CUDA NVIDIA GeForce GTX 1060 6GB'):
    24.2722 [opencl_profiling] spent  0.0003 seconds in [Write Image (from host to device)]
    24.2722 [opencl_profiling] spent  0.0001 seconds in rawprepare_1f
    ....
    24.2723 [opencl_profiling] spent  0.0206 seconds totally in command queue (with 0 events missing)
    24.2741 [dev_process_image] pixel pipeline took 0.161 secs (0.721 CPU) processing `2024-02-04_17-22-10_P1030065.RW2'
1 Like

I’m not exactly sure what fixed it, but it seems to be working now. The ‘multiple GPUs’ thing was a mistake apparently left over from resetting the defaults, which I had missed. I changed that back to ‘vary fast GPU’, and changed the cldevice_v5_nvidiacudanvidiageforcegtx1660super setting from 128 to 1024.

Also the newer driver may have something to do with it.

Anyway I really appreciate all the help. I hope I can do the same for someone sometime.

5 Likes