darktable and multiple graphics cards

betazoid · January 18, 2020, 11:03am

Hi, I have asked this some time ago in the darktable mailing list but did not get any useful answers, so I am trying it again here.
I have a new laptop with dual graphics, one of the devices is an Nvidia MX250 and the other one is Intel. CPU is Intel Core i7-1065G7, 4x 1.30GHz, 16 GB RAM.
Under Manjaro and Ubuntu darktable can use both devices because the Intel Neo driver is available. If the driver is installed, darktable automatically sets the opencl settings to multiple graphics.
However, I am not sure if darktable is faster with dual graphics. I have done some measurements with darktable -d perf, but the result is not quite clear. E.g. I measured region of interest processing when several instances + denoise profiled non local were active, the result was around 4 seconds, both with (Nvidia+Intel) and without (=CPU+Nvidia) dual graphics. The difference was in that case maybe 0.2 secs. In other cases, there seems to be no difference at all, or sometimes maybe up to 0.5 secs, if processing of the entire region of interest pixel pipeline takes maybe 1.5 secs (between 1.2 and 1.7 secs there is a significant difference from my point of view).
Now I do not understand the whole technical background. Which device does what? Should there be a significant difference with multiple graphics (in theory)? Are there any special settings in darktable that I should use?
Btw, I read the opencl/multiple graphics chapter in the manual but did not understand much.
Thanks in advance for the feedback
b

anon41087856 · January 18, 2020, 1:03pm

When you are in darkroom, you always compute 2 pipes:

the main preview (in the center)
the thumbnail preview (shown on the navigation thing, and the “blurry” pic you see when you pan/scroll/zoom the main preview, before the full resolution is recomputed, used for histograms).

The deal, with multiple OpenCL devices, is to send the heaviest pipe to your more powerful GPU (main preview), and the other one the your least powerful GPU. The point is to have both pipes run in parallel, otherwise, using only one GPU, one pipe is started only when the previous is finished.

To distribute pipes among GPU, select the OpenCL profile to “default” in preferences, and type that line in darktablerc file :

opencl_device_priority=0,1,*/1,0, */0,1,*/1,0,*

The numbers point to devices as in the OpenCL output (starting darktable -d opencl):

0.633303 [opencl_init] discarding device 2 `pthread-Intel(R) Xeon(R) CPU E3-1505M v6 @ 3.00GHz' because the driver `OpenCL 1.2 pocl HSTR: pthread-x86_64-unknown-linux-gnu-skylake' is blacklisted.
0.633310 [opencl_init] OpenCL successfully initialized.
0.633313 [opencl_init] here are the internal numbers and names of OpenCL devices available to darktable:
0.633314 [opencl_init]		0	'Quadro M2200'
0.633317 [opencl_init]		1	'Intel(R) Gen9 HD Graphics NEO'
0.633320 [opencl_init] FINALLY: opencl is AVAILABLE on this system.
0.633322 [opencl_init] initial status of opencl enabled flag is ON.

See https://www.darktable.org/usermanual/en/darktable_and_opencl_multiple_devices.html for more details.

Notice that this will not make your individual pipes run faster, but the general app instead (because they are not run serialized but parallelized). That will not be measured by -d perf.

betazoid · January 18, 2020, 1:56pm

Ok. Thanks. It is still complicated. So basically I need to make sure that the center view is always processed by the Nvidia. Darktable detects the OpenCL devices but it does not know which one is faster, does it?
Actually I do not “use” that thumbnail in the top left corner at all, I usually hide the left panel.
If I understand it correctly, multiple OpenCL devices are important if I batch process several/many raw files at once, which I hardly ever do. If I always export just one at a time the device priority order is not so important.
Ok, to sum up, in my case there is basically no difference between one or two GPU. Since the Nvidia is detected as dev 0, it is always used for the center view. Different settings are useful if the faster GPU is deteted als dev 1.

anon41087856 · January 18, 2020, 2:02pm

darktable will probably always number the Nvidia GPU 0, but indeed, it doesn’t know its specs. Just use the code line I gave you, it should be fine.

Doesn’t matter, the thumbnail is computed all the time since it’s needed for histograms and colour pickers.

They are quite important in lighttable too. Each thumbnail starts a new pipe, so you can only benefit from having several devices. Exports are not really important, but UI responsiveness is key.

betazoid · January 18, 2020, 2:22pm

Why do you recommend Intel first for processing the thumbnails/lighttable? Why not Nvidia?

anon41087856 · January 18, 2020, 3:04pm

It’s not very important, it just ensures that whenever you switch the view between lighttable/darkroom, the Nvidia GPU is less likely to be used at this time, so the transition is faster.

MStraeten · January 19, 2020, 12:38am

It’s just a kind of loadbalancing, defaulting the less performance critical stuff to the slower gpu. So more power ist available for performance critical stuff.
I found (on OSX) when using explicit prioritization of gpus it’s better to set the preference to default instead of multiple gpu

betazoid · January 20, 2020, 8:52am

What’s the CPU doing in darktable if there are more than one GPUs? Is it not used at all or only for modules which do not support OpenCL?

paperdigits · January 20, 2020, 5:46pm

Only some modules are GPU accelerated. The CPU takes care of the rest.

anon41087856 · January 21, 2020, 12:50am

CPU fetches the file on the disk, saves it, executes modules that are not GPU-accelerated, handles XMP sidecars and database reading/writing (editing history, tags and other metadata), and paints the whole UI through GTK.

betazoid · January 30, 2020, 6:07pm

Ok. I just switched to Fedora and darktable actually sees my Nvidia as device 1 and my Intel as device 0.
Edit: I think I understand it now: I excluded device 0 from processing the center view and device 1 from processing the other preview. Seems to be a bit faster.