Running DarkTable-cli in multiply parallel

Is there any way to run DarkTable cli in parallel?
Even if I define to use new database for every cli instance I have error that database is locked.

Goal is run 8-18 parallel processing.

You can use an in-memory database.

But what is the point? Darktable already uses multiple threads to process the images, you are unlikely to speed upprocessing.

Did you try to place the different databases in different folders ?

But as @kofa already said : a singe instance of darktable tries to use as much resources as it can get (dependent on your preference settings). I do not think that multiple instances bring an advantage, rather the opposite.

Hm. I trying to use :memory: directive but not found it worked.
Tried to define new db path on RAM drive (windows).
And always had sqlite3 error.

Goal is speedup.
Even if DT trying to do multi threading it slow as don’t know what.

For this moment fastest one is Lightroom. It process 5 files in parallel and can give best speed for complicated batch processing scenarios.
With RawTherapy run in 18x parallel tasks I can reach about 90% of Lightroom speed on similar presets.

With DarkTable I see only single file that slooooowly processing

Don’t use a RAM drive. Use the in-memory DB setting.

Read --library here:
https://docs.darktable.org/usermanual/development/en/special-topics/program-invocation/darktable/

What you should do instead of running multiple darktable instances in parallel is, I think, finding out what is slow. Some modules (for example, diffuse or sharpen, or highlight reconstruction in guided laplacians mode, are computationally very intensive.
At least try -d perf.
Do you have a GPU? Is it set up properly, and can darktable use OpenCL? If you do have a GPU and OpenCL is enabled, is it well-tuned? What does -d opencl say? Do you have enough memory (main RAM and GPU RAM)? What does -d tiling report?

So, run darktable with
-d perf -d opencl -d tiling
and report the result here (it’d be nice to know what size of images you process, and what your computer’s specs are, hardware- and software-wise).

Windows 10, 128Gb RAM, i9 7980XE, 4x1080ti, nVme disks.
Not so new but still pretty usable hardware.

I will check tomorrow. Want give a chance to DT :slight_smile:

OK. As far as I know, darktable won’t use multiple GPUs for the export pipeline.
Here is the code that tries to find a card:
https://github.com/darktable-org/darktable/blob/master/src/common/opencl.c#L1767-L1776

I’m not a C developer, though, and I cannot figure out if the lock is an instance-level one, or shared between the instances. If the latter, I think you could use 4 instances, each with a different GPU.

@hanatos or @hannoschwalm would know that, I’m sure.

Just a single pipe. Doesn’t make much sense otherwise as system memory would be the bottleneck.

Vlad uses quite a beefy machine. 4 cards, 128 GB of RAM.

How much memory in the 1080ti? How large is the raw image? Do you have the latest nvidia drivers?

And what version of dt?

Right. If that was mine I would work on the export pipe. :grinning: Currently I have no chance to test…

1 Like

11Gb VRAM.
latest drivers.

I don’t talking about simple raw to tiff conversion. For that libraw can do blazing fast. I compare pretty strong editing with similar Lightroom or RawTherapy preset.

Each card has 11vram?

Post a -d perf

I have 12gb 3060 card and my export takes less than 1sec.

1 Like

it’s not super easy to include multiple gpus in the processing of one image and gain something. too much communication overhead.

but exporting this many images at the same time it would probably make sense to overschedule some, to be able to interleave disk io and processing. also dt doesn’t necessarily scale perfectly, so starting a few cli at the same time likely makes sense.

if you only want to use the gpus, vulkan supports headless mode that allows us to pick the device used for processing (regardless whether it’s driving the screen or not). it would probably make sense to write a script that dispatches every one out of four jobs to a specific gpu. vkdt-cli supports --device. there’s probably something in dt’s magic opencl_device_priority config option that could achieve the same.

i’ve never attempted anything like this, dunno if any of these options work if you have 4x the identical device (same name). probably requires a commandline option to pass the device id instead (edit: there’s --device-id in vkdt-cli now too. completely untested of course).

2 Likes

Pretty much all SLI for gaming is dead. Chunking the work and then letting each GPU do a chunk might still work. (a chunk could be one DT instance per GPU in this case.)

If you start multiple threads exporting you also have to the system memory aware of that to allow proper tiling for cpu fallbacks.

Not really sure about comparing benchmarks via completely different software. Unless you can get a pipeline processing log from each and even then how can you know how much processing is going on???