Darktable 3.9 OpenCL params

kofa · April 2, 2022, 1:28pm

https://docs.darktable.org/usermanual/3.8/en/special-topics/opencl/performance/

opencl_micro_nap
In an ideal case you will keep your GPU busy at 100% when reprocessing the pixelpipe. That’s good. On the other hand your GPU may also be needed to do regular GUI updates. It might happen that there is no sufficient time left for this task. The consequence would by a jerky reaction of your GUI on panning, zooming or when moving sliders. To resolve this issue darktable can add small naps into its pixelpipe processing to have the GPU catch some breath and perform GUI related activities. The opencl_micro_nap parameter controls the duration of these naps in microseconds. You will need to experiment in order to find an optimum value for your system. Values of 0, 100, 500 and 1000 are good starting points to try. The default is 1000.

I used to use 0 with my NVidia 1060 and experienced no issues. However, the current code (rev 5872ca) restricts the value to 101 … 1’000´000, and 0 (or any other value less than 101) will be changed to 1000:

  if((cl->dev[devid].micro_nap <= 100) || (cl->dev[devid].micro_nap > 1000000))
    cl->dev[devid].micro_nap = 1000;

github.com

darktable-org/darktable/blob/2b96c991702921042119e18aac7f8703aa274a8b/src/common/opencl.c#L187-L188


      
          if((cl->dev[devid].micro_nap <= 100) || (cl->dev[devid].micro_nap > 1000000))
            cl->dev[devid].micro_nap = 1000;

What is the reason for this?
I think it’d be more reasonable to set ‘too low’ values to the minimum, ‘too high’ values to the maximum. And 101 as the minimum allowed value is weird.

hannoschwalm · April 2, 2022, 6:52pm

Agreed! Will fix

priort · April 2, 2022, 9:07pm

I had been doing some testing in those old threads we had running about optimization. I have stopped with all the recent work to let it mature…I am interested in how my new PC will respond. It has a 3060TI so it should be quite fast but it will be nice to have it running at top speed. I had tried 0 as well with no issue and I think it benched a bit faster with that setting…

hannoschwalm · April 3, 2022, 4:39am

Just a comment about micro_nap. I didn’t follow any thread about optimization and thus can’t comment on that but, this can’t affect dt performance significantly. This is more about overall system stability and GUI responsiveness.

This is technically just a way to allow the OS scheduler to switch context at a defined state. So - at least on non-cpu CL devices we can be less conservative by default.

About having no issues, i also never had any except in situations with heavy system load …

guille2306 · April 3, 2022, 12:07pm

I would say that on systems were the GPU is being used only for OpenCL a setting of 0 is the more correct, as you actually do not have to share it with any other process.

hannoschwalm · April 3, 2022, 2:47pm

If you have two CL devices, yes. Anyway, a) performance gain is almost not measurable (with setting to 1000 there will be something like 10-20ms (depending on the nr of modules and size of memory) you get and b) dt can’t detect this in a safe way.

elstoc · April 3, 2022, 2:56pm

Before I upgraded, I only used my old GPU for darktable/openCL - everything else was driven from the onboard graphics (not a CL device). So this (two CL devices) isn’t the only scenario where micro_nap isn’t required.

guille2306 · April 3, 2022, 3:50pm

This is the same situation I’m in: the GPU is the only OpenCL device but all graphics are routed through the no-OpenCL CPU.

guille2306 · April 3, 2022, 3:52pm

I agree in that, if darktable can’t detect it in a safe way, the default should be non-zero. What I don’t agree is that the 0 value is not useful.

rvietor · April 3, 2022, 5:20pm

But even a web browser can use hardware acceleration these days. That means it’ll use a/the GPU, probably not openCL though…
So that makes it difficult to be sure you don’t have any other programs using the GPU, so a micronap of 0 could give some unexpected results at an unexpected time.

So while a 0 value can be useful, using it as a default would probably give some nasty surprises for some users.

hannoschwalm · April 3, 2022, 5:53pm

Firefox does - each instance takes here ~100MB and certainly is slowed down if darktable takes it all.

In fact - the micro_nap of 0 does not do a usleep() so the OS scheduler is not called. Of course you can do that - the latest merged pr allows to do so - but i don’t think it is a good idea. But - still possible and you can do so …

elstoc · April 3, 2022, 6:04pm

Again, usually only if you’re driving graphics from the GPU and X knows about its existence. When I was driving my screen from my onboard motherboard graphics, nvidia-smi showed darktable was the only thing using it.

guille2306 · April 3, 2022, 6:19pm

Not if you explicitly tell the system not to use the GPU (which you can do with NVidia Prime)

kofa · April 3, 2022, 6:22pm

darktable-cli . tiffs --out-ext tiff --core --conf plugins/imageio/format/tiff/bpp=16 created 16-bit tiffs for me.

$ identify *tif
IMG_20220212_132536.tif TIFF 3000x4000 3000x4000+0+0 16-bit sRGB 56.0528MiB 0.000u 0:00.001
IMG_20220305_154358.tif TIFF 1997x3550 1997x3550+0+0 16-bit sRGB 31.7147MiB 0.000u 0:00.000
IMG_20220311_215404.tif TIFF 3000x4000 3000x4000+0+0 16-bit sRGB 54.3994MiB 0.000u 0:00.000
IMG_20220312_172515.tif TIFF 3000x4000 3000x4000+0+0 16-bit sRGB 63.0289MiB 0.000u 0:00.000

Interestingly, even though I specified the output extension as --out-ext tiff, they were created with .tif.

rvietor · April 3, 2022, 6:29pm

Which is why I also said:

Yeah, well, the problem is with the “explicitly” there. If a user knows enough to do that, he probably knows enough to decide whether 0 micronap works for him. Not so your average user, perhaps…

It’s easy to just take a partial quote to deform what someone is trying to say.

Bye

guille2306 · April 3, 2022, 6:39pm

Yeah, about taking partial quotes:

priort · April 4, 2022, 6:12am

I can only seem to get the first file in a directory to output… and as you say its tif , at least it is 16 bit

I put 4 files in a folder called c:\test2

Can’t seem to specify the input correctly …anything other than *.dng crashes

EDIT: As a side note your syntax works perfectly in WSL I just can’t seem to tweak it for windows… we may be on the wrong thread of discussion for this topic…