After having fixed quite many bugs and made the code more stable, I have started to look into the problem of speed improvements, and I’ve set-up some simple benchmarks.
The first item I have worked on has been the Amaze demosaicing, which is derived from the corresponding RawTherapee code. So far the SSE2 optimizations were disabled, so I went on a enabled the SSE2 code for Amaze. This gives a nice 2x speed improvement for the demosaicing phase, which is really quite cool!
I’m running my benchmarks on a VirtualBox machine with Ubuntu 17.04 guest and two cores.
Photoflow is compiled with the following flags:
-std=gnu++11 -march=nocona -mno-sse3 -mtune=generic -g -O3
My term of comparison is RawTherapee 5.2 compiled from sources on the same virtual machine, with the following configuration:
Version: 5.2-190-gf0acd239
Branch: dev
Commit: f0acd239
Commit date: 2017-09-21
Compiler: cc 6.3.0
Processor: generic x86
System: Linux
Bit depth: 64 bits
Gtkmm: V3.22.0
Lensfun: V0.3.2.0
Build type: Release
Build flags: -std=c++11 -mtune=generic -Werror=unused-label -fopenmp -Werror=unknown-pragmas -Wall -Wno-unused-result -Wno-deprecated-declarations -O3 -DNDEBUG
Link flags: -mtune=generic
OpenMP support: ON
MMAP support: ON
Saving a Nikon D810 RAW file to Jpeg using the standard camera color matrix gives the following figures:
- PhF, Amaze without SSE2: 6300 ms
- PhF, Amaze with SSE2: 4800 ms
- RT with neutral profile: 5300 ms
When the ICC conversion from the camera colorspace to sRGB is disabled, the time for saving a D810 RAW to Jpeg goes down to 2300 ms.
This already indicates that the next item worth some optimizations are the ICC conversions, which are currently entirely handled by LCMS2.
For comparison, saving the same D810 RAW file to Jpeg with the neutral profile in RT takes about 5300 ms, so slightly more than PhF. However, this is not a completely fair comparison, because RT goes through a chain of colorspace conversions (camera → working RGB → Lab → output RGB) even when the neutral profile is used, while PhF does only camera → output RGB.
This is just the tip of the iceberg, as lots of tools in PhF are in bad need of SSE and other code optimizations. I will post here progresses and benchmarks whenever I will have some nice new results… meanwhile, the optimized Amaze code will be committed in the next few days, after few more checks.