Converting back and forth Lab <-> ProphotoRGB produces slight magenta shift

anon41087856 · September 26, 2018, 7:40pm

I have a RAW picture in 32-bits floats unnormalized I convert back and forth Lab <-> XYZ <-> prophotoRGB in C with SSE2 instructions:

      // transform the pixel to sRGB:
      // Lab -> XYZ
      __m128 Lab = _mm_load_ps(in);
      __m128 XYZ = dt_Lab_to_XYZ_sse2(Lab);
      // XYZ -> sRGB
      __m128 rgb = dt_XYZ_to_sRGB_sse2(XYZ);

      // transform the result back to Lab
      // sRGB -> XYZ
      XYZ = dt_sRGB_to_XYZ_sse2(rgb);
      // XYZ -> Lab
      __m128 outv = dt_XYZ_to_Lab_sse2(XYZ);
      _mm_store_ps(out, outv);

The conversions are like that :

static inline __m128 dt_XYZ_to_prophotoRGB_sse2(__m128 XYZ)
{
  // XYZ -> prophotoRGB matrix, D50
  const __m128 xyz_to_rgb_0 = _mm_setr_ps( 1.3459433f, -0.5445989f, 0.0000000f, 0.0f);
  const __m128 xyz_to_rgb_1 = _mm_setr_ps(-0.2556075f,  1.5081673f, 0.0000000f, 0.0f);
  const __m128 xyz_to_rgb_2 = _mm_setr_ps(-0.0511118f, -0.0205351f, 1.2118128f, 0.0f);

  __m128 rgb
      = _mm_add_ps(_mm_mul_ps(xyz_to_rgb_0, _mm_shuffle_ps(XYZ, XYZ, _MM_SHUFFLE(0, 0, 0, 0))),
                   _mm_add_ps(_mm_mul_ps(xyz_to_rgb_1, _mm_shuffle_ps(XYZ, XYZ, _MM_SHUFFLE(1, 1, 1, 1))),
                              _mm_mul_ps(xyz_to_rgb_2, _mm_shuffle_ps(XYZ, XYZ, _MM_SHUFFLE(2, 2, 2, 2)))));
  return rgb;
}

static inline __m128 dt_prophotoRGB_to_XYZ_sse2(__m128 rgb)
{
  // prophotoRGB -> XYZ matrix, D50
  const __m128 rgb_to_xyz_0 = _mm_setr_ps(0.7976749f, 0.2880402f, 0.0000000f, 0.0f);
  const __m128 rgb_to_xyz_1 = _mm_setr_ps(0.1351917f, 0.7118741f, 0.0000000f, 0.0f);
  const __m128 rgb_to_xyz_2 = _mm_setr_ps(0.0313534f, 0.0000857f, 0.8252100f, 0.0f);

  __m128 XYZ
      = _mm_add_ps(_mm_mul_ps(rgb_to_xyz_0, _mm_shuffle_ps(rgb, rgb, _MM_SHUFFLE(0, 0, 0, 0))),
                   _mm_add_ps(_mm_mul_ps(rgb_to_xyz_1, _mm_shuffle_ps(rgb, rgb, _MM_SHUFFLE(1, 1, 1, 1))),
                              _mm_mul_ps(rgb_to_xyz_2, _mm_shuffle_ps(rgb, rgb, _MM_SHUFFLE(2, 2, 2, 2)))));
  return XYZ;
}

The output gives slight magenta cast (just converting back and forth, nothing happens in-between).

The same code in OpenCL instructions doesn’t shift the colors, neither does a conversion to gamma-corrected sRGB (through the same Lab <-> XYZ function, so only the XYZ <-> prophotoRGB is faulty).

Here is the OpenCL variant:

float4 prophotorgb_to_XYZ(float4 rgb)
{
  const float rgb_to_xyz[3][3] = { // prophoto rgb
    {0.7976749f, 0.1351917f, 0.0313534f},
    {0.2880402f, 0.7118741f, 0.0000857f},
    {0.0000000f, 0.0000000f, 0.8252100f},
  };
  float4 XYZ = (float4)(0.0f, 0.0f, 0.0f, 0.0f);
  XYZ.x += rgb_to_xyz[0][0] * rgb.x;
  XYZ.x += rgb_to_xyz[0][1] * rgb.y;
  XYZ.x += rgb_to_xyz[0][2] * rgb.z;
  XYZ.y += rgb_to_xyz[1][0] * rgb.x;
  XYZ.y += rgb_to_xyz[1][1] * rgb.y;
  XYZ.y += rgb_to_xyz[1][2] * rgb.z;
  XYZ.z += rgb_to_xyz[2][0] * rgb.x;
  XYZ.z += rgb_to_xyz[2][1] * rgb.y;
  XYZ.z += rgb_to_xyz[2][2] * rgb.z;
  return XYZ;
}

float4 XYZ_to_prophotorgb(float4 XYZ)
{
  const float xyz_to_rgb[3][3] = { // prophoto rgb d50
    { 1.3459433f, -0.2556075f, -0.0511118f},
    {-0.5445989f,  1.5081673f,  0.0205351f},
    { 0.0000000f,  0.0000000f,  1.2118128f},
  };
  float4 rgb = (float4)(0.0f, 0.0f, 0.0f, 0.0f);
  rgb.x += xyz_to_rgb[0][0] * XYZ.x;
  rgb.x += xyz_to_rgb[0][1] * XYZ.y;
  rgb.x += xyz_to_rgb[0][2] * XYZ.z;
  rgb.y += xyz_to_rgb[1][0] * XYZ.x;
  rgb.y += xyz_to_rgb[1][1] * XYZ.y;
  rgb.y += xyz_to_rgb[1][2] * XYZ.z;
  rgb.z += xyz_to_rgb[2][0] * XYZ.x;
  rgb.z += xyz_to_rgb[2][1] * XYZ.y;
  rgb.z += xyz_to_rgb[2][2] * XYZ.z;
  return rgb;
}

What am I missing here ?

heckflosse · September 26, 2018, 8:18pm

I don’t know what you miss there but want to give you a hint:
For better readability (which may also help to let you know what you are missing) you can write (at least in gcc and in clang)

__m128 r = __mm_setr_ps(...);
__m128 g = __mm_setr_ps(...);
__m128 b = __mm_setr_ps(...);
__m128 c = __mm_setr_ps(...);
__m128 result = (r + g + b ) * c;

instead of writing it in SSE notation which would be

__m128 r = __mm_setr_ps(...);
__m128 g = __mm_setr_ps(...);
__m128 b = __mm_setr_ps(...);
__m128 c = __mm_setr_ps(...);
__m128 result = _mm_mul_ps(c, _mm_add_ps(_mm_add_ps(r, g), b));

I use this notation almost always in RawTherapee code. It improves the readability of SSE code while also allowing further optimizations by the compiler.

@anon41087856 Glad to see that I’m not the only SSE coder here

anon41087856 · September 26, 2018, 8:47pm

Thanks ! I have just taken the template from another function.

heckflosse · September 26, 2018, 9:05pm

If you need to dive into SSE coding, you may also have a look at this files:

github.com

Beep6581/RawTherapee/blob/dev/rtengine/helpersse2.h

////////////////////////////////////////////////////////////////
//
//  this code was taken from http://shibatch.sourceforge.net/
//  Many thanks to the author of original version: Naoki Shibata
//
//   Copyright Naoki Shibata and contributors 2010 - 2021.
// Distributed under the Boost Software License, Version 1.0.
//    (See accompanying file sleef_LICENSE.txt or copy at
//          http://www.boost.org/LICENSE_1_0.txt)
//
//  This version contains modifications made by Ingo Weyrich
//
////////////////////////////////////////////////////////////////

#ifndef __SSE2__
#error Please specify -msse2.
#endif

#ifdef __GNUC__
#define INLINE __inline

This file has been truncated. show original

https://github.com/Beep6581/RawTherapee/blob/dev/rtengine/sleefsseavx.c

anon41087856 · September 26, 2018, 9:29pm

Found it ! I’m a big moron… EDIT : nope, still stuck

Thanks for your insights, I’m hacking darktable, so I guess everything is taking care of. (I hope).

What do these files do ? I’m new to SSE (but not to vectorial computing, so I find SSE/OpenCL actually easier than reular C).

heckflosse · September 26, 2018, 9:50pm

These files provide a lot of usefull SSE stuff

1.) very fast SSE versions of log/exp/sin/cos/sincos/atan2 …
2.) some highly optimized convinient functions (for example load and store from raw data where only every second value is used LC2VFU and STC2VFU)
3.) selection functions to avoid branches in SSE code (also optimized for machines with SSE4)

I picked up that code some years ago and optimized it further for speedups used in RawTherapee

paulmiller · October 1, 2018, 12:46pm

I think you have a sign error in dt_XYZ_to_prophotoRGB_sse2:

const __m128 xyz_to_rgb_2 = _mm_setr_ps(-0.0511118f, -0.0205351f, 1.2118128f, 0.0f);

should be:

const __m128 xyz_to_rgb_2 = _mm_setr_ps(-0.0511118f, 0.0205351f, 1.2118128f, 0.0f);

anon41087856 · October 1, 2018, 7:05pm

You’re right ! Thank you very much ! I will test that asap.

anon41087856 · October 1, 2018, 8:50pm

Ok that solves it ! Thanks again, I checked everything for hours, I have never been able to see it.

paulmiller · October 2, 2018, 1:20pm

Been there, done that – sometimes you can look straight at the error and not see it because you know that you typed that bit right…

Your 2 experimental darktable modules are very interesting - I’ve been going though some of my more ‘difficult’ files and in most cases I can get excellent results just using exposure, fix input profile (logarithmic) and colour balance. I don’t need multiple tone curves and highlights/shadows any more!

anon41087856 · October 2, 2018, 9:28pm

Good to hear ! I am heavily testing them currently, I get better results with less effort as well.