PhotoFlow optimizations and benchmarks

I have finally started to work on the optimization of ICC conversions. The first and most obvious case are RGB -> RGB conversions from and to linear matrix profiles, with relative colorimetric intent and no black point compensation.
In this specific case, the conversion can be reduced to the product of a 3x3 matrix and a 3-element RGB vector.
For the moment I implemented this fast path in straight C code, without SSE optimizations (see here). Even with this simple code, the gain is HUGE: on my test machine and with one single thread, the conversion of a Nikon D810 TIFF file in linear Rec.2020 colorspace into a linear sRGB Jpeg goes from 10s with LCMS2 to 900ms with the fast path, i.e. more than 10x faster!

The code is committed to github, and the most recent packages from today already take advantage of this enhancement.

I am not really sure if SSE2 optimizations would provide a large gain in this case, and if they are worth the effort. I started to read some examples of SSE2 optimizations for the vector dot product, and got the feeling that explicit optimizations might even result in slower code than what is generated by the compiler… maybe @heckflosse has some good advice on this?

2 Likes