PhotoFlow optimizations and benchmarks

Here is the first update on the work on PhotoFlow optimization. This time I have included the most recent code for automatic Chromatic Aberrations corrections from RT, and enabled SSE2 optimizations.

Preamble: the improvements I am showing here are by no means the result of my own ideas. Instead, they come from the hard work done by @heckflosse and other RT developers! I have just taken the state-of-the-art RT code and plugged it into photoflow, with few modifications to adapt it to the photoflow processing pipeline.

Talking about the processing model, a big difference between RT and PF is that RT bases its parallel processing on OpenMP, while PF performs a parallel processing of image tiles using normal threads.

In the specific case of the CA correction, there is also another difference: in PF the analysis phase to derive the CA correction parameters is only performed once when the image is opened, while in RT it is AFAIK repeated each time the image is processed.

The benchmark is based as usual on an Ubuntu VM with 2 cores and 4GB of RAM, running in an OSX host with 4 cores and 8GB or memory.

Here are the results:

  • amsterdam.pef processed with PhotoFlow, Amaze demosaicing and Jpeg sRGB output:
    no CA correction: 1470 ms
    old CA correction: 1700 ms
    new CA correction, no SSE2: 1690 ms
    new CA correction, with SSE2: 1630 ms (difference with/without CA: 160 ms)

    The improvement is not dramatic, but still measurable and not zero.

  • amsterdam.pef processed with RawTherapee, Amaze demosaicing and Jpeg sRGB output:
    no CA correction: 1490 ms
    with CA correction: 1700 ms

RT and PF are very close here.

Differences become more prominent when processing bigger images like Nikon D810 RAWs:

  • D810 processed with PhotoFlow, Amaze demosaicing and Jpeg sRGB output:
    no CA correction: 4670 ms
    new CA correction, with SSE2: 5190 ms (difference with/without CA: 520 ms)

  • D810 processed with RawTherapee, Amaze demosaicing and Jpeg sRGB output:
    no CA correction: 5000 ms
    new CA correction, with SSE2: 5850 ms (difference with/without CA: 850 ms)

Since the code used in the two programs is basically the same, I assume that the differences come from the fact that RT is repeating the CA analysis during the processing phase…