Use rawtherapee-cli in parallel

Hi,

I need to convert a huge amount of images all with the same pp3. Actually I’m using rawtherapee-cli (it’s amazing :D) on a linux machine. I would like to run multiple instances of my script in order to execute it faster. Is that possible? Do you think I will get better execution time?

Hi and welcome,

Just try it out. Put one half of the files to folder A and the other half to folder B. Then process each of the folders at the same time using rawtherapee-cli.

It may or may not be faster depending on what you do. If you export to compressed tiff, I would excpect it to be faster.

If your input files are for example Nikon NEF (decoding them uses only one core) I would also expect it to be faster. At the contrary using for example Sony ARW files (for which all available cores are used during decode) I would not expect it to be faster (except for the case you export to compressed tiff files as mentioned above)

It all depends in your processing.

You should check out GNU Parallel, which is for exactly your use case: NAME — GNU Parallel 20210122 documentation

Some more details:

Assume, you process a Nikon NEF to a compressed tiff.

Then you will have this steps internally in RT

  1. load and decode the NEF (Limited by IO-speed and decoding uses only 1 thread)
  2. raw preprocessing, demosaic and a lot of other stuff before you save to compressed tiff (most of this is using all cores of your machine already).
  3. save to compressed tiff (this used only 1 thread and is quite slow)

So, by running multiple rt-cli in parallel you could get a speedup in this scenario because you maybe get a better load on your cpu during step 1) and step 2)

Thanks a lot for your clear explaination. My scenario is:

  1. Load a DNG (took from a DJI drone)
  2. process it with a pp3 (sharpening, noise reduction and contrast adjustment)
  3. Save to a JPG (quality 90 and cs 2)

my first try was not so good (small server with two cores), only one instance on a full directory it took around 1 hour. splitting the data in two directory and running rawtherapee on both took around 2 hours. What do you think? Maybe using more cores could be better?

Yes, more cores and only one instance of rawtherapee-cli should be best for your scenario

You are right using 4 cores instead of two is much better.
Annotation 2020-06-05 152631

Probably for me the best trade off is using 4 core in sequential.

Thanks for the help!

May I ask, which demosaicer you use?

I think the default one: AMaZe (border 4).

do you suggest some other?

RCD is faster than AMaZe and works better in combination with Capture Sharpening (less artefacts)

Thanks for suggestion. Just tried on the same dataset using the RCD method. The execution time is 39 mintues instead of 40. It’s anyway a gain :smiley:

In your scenario most likely most of the processing time is for denoise (though hard to say without knowing the content of your .pp3 exactly)

I’ve done this before… If you need to speed up that time, spin up a vps with lots of cores on a provider where you can pay by the minute, rsync your files, kick off the processing job, rsync down the results, delete the VM

1 Like

This is something that I am currently doing using the multiprocessing module in Python.
Specifically:
from multiprocessing.pool import ThreadPool

I set my thread count to 32 workers and can saturate the cores.

For my process I am converting RAW to .tiff format specifically to apply a .dcp for color correction.
I found that largest times sucks were the demosaicing and image compression.

When exporting to tiff via CLI one of the missing features from RT is the ability to choose LZW compression instead of ZIP. LZW compression is only available via the UI and not via a CLI parameter.

Here are some of the timings I documented:
On single images:
Zip compression, AMaZE, Sharpening = 36 seconds
Uncompressed, AMaZE, Sharpening = 16.6 seconds
Uncompressed, AMaZE, No sharpening, minimal pp3 settings = 6.1 seconds
Uncompressed, No demosaicing (i had to hack the pp3 to achieve this), minimal pp3 settings = 1.9 seconds

This will help you with utilizing the multiprocessing module and apply_async:

Cool, are you doing raw video processing with Rawtherapee? I do too, and it’s just amazing, the video quality I get.

On a machine with 4 cores you could also try to run two rawterapee-cli instances, but restrict each of them to 2 threads like this:

OMP_NUM_THREADS=2 rawtherapee-cli