So, the code took me 36 h, LaTeX-ifying it will take it another 12h…
Premisses
- We are before demosaicing in the pipeline. Since color doesn’t exist in its perceptual form before full chromatic adaptation and input profiling, and since both rely on fully-formed 3D vectors, we are forbidden to use any kind of colorimetric concept (saturation/chroma, hue, luminance) or assumption so early in the pipeline, meaning R = G = B at this place holds no guaranty of being neutral color at the end.
- From the previous point, we can derivate 2 consequences:
- at this place in the pipe, the latent image is to be considered solely as a sparse collection of gradients,
- an additional step of highlights reconstruction needs to happen later in the pipe over the colorimetric signal, to deal with remaining color artifacts (such as magenta) and gamut mapping issues (because clipped highlights very often have high chroma too, and will likely be out of any display space gamut).
Since the development of filmic reconstruction, I noticed that trying to nuke magenta highlights too soon in the pipe leads to unnecessary losses of details and simply forcing magenta highlights to degrade to pure white retains more texture without creating any artifact in the process.
Proposal
In clipped regions:
- Transfer gradients between channels without any further assumption,
- Smoothen brutal transitions between valid and clipped regions, both for aesthetic reasons and to prevent demosaicing algos from overshots and fringes,
- Restore the low-frequency
Tooling
Laplacian
The laplacian is the divergence of the gradient. It is used to encode texture, create bump maps and transfer textures. It has the nice property of modelling the oscillations around a local average, meaning it is itself fairly invariant to the magnitude of the signal. This will prove useful to copy-paste gradients between channels that have different scalings and magnitudes.
Bspline wavelets
I have shown in Rotation-invariant Laplacian for 2D grids - Aurélien PIERRE Engineering that @hanatos 's a-trous wavelets (https://jo.dreggn.org/home/2010_atrous.pdf), using a cardinal B-spline, are very close to a laplacian pyramid:
- the B-spline is numerically close to a gaussian function G(\sigma, \mu) of \sigma = 1.055 and \mu = 0 while having finite support,
- the difference between the gaussian blur of a signal and the signal itself is proportional to a laplacian, \Delta u = K(u - u * Gauss(\sigma, \mu)), with K = \dfrac{2 \pi}{\sqrt{\pi}\sigma^2}, \mu = 0
- the iterative scheme of wavelets, applying blurs of radius 2^n\sigma on top of each other, leads to the n-th scale of gaussian wavelet (which is therefore a laplacian pyramid) to be equivalent to a gaussian blur of \sigma_n = \sqrt{(\sigma_{n-1})^2 + (2^n \sigma)^2}.
The B−spline wavelet will thus allow us to perform a multi-scale reconstruction of laplacians. It is already used in filmic reconstruction, contrast equalizer and diffuse/sharpen modules under its fast “à trous” variant.
Guided filter
The infamous work of Kaiming He (https://arxiv.org/pdf/1505.00996.pdf). The core idea is absolutely brilliant by its simplicity: it’s a piece-wise linear least-squares fitting of some image (the input) against another image (the guide).
out = a × in + b where a = \dfrac{covariance(in, guide)}{variance(in) + \epsilon} and b = average(guide) - a × average(in). \epsilon is the feathering value, as it increases, we force out to be more piece-wise constant (the b offset increases), as it decreases, we force a better support of fractal edges (hair, feathers, etc.).
@rawfiner and I have shown that the guided filter is not magnitude-invariant, which leads to the EIGF. However, since we are going to guide the laplacians here, which are magnitude-invariant and should always have a local average close to 0, we are good.
The principle of guiding laplacians has been already used in the MLRI demosaicing method with great success (http://www.ok.sc.e.titech.ac.jp/res/DM/MLRI.pdf).
The window width used for the piece-wise linear fit will be twice the radius of the blur for each wavelet scale. We also use an “à trous” variance and covariance estimation for speed, meaning we sample 9 values at the edges of the square window and in the center.
(An)Isotropic diffusion
The idea of applying the Fourier heat transfer PDE to image structure and texture reconstruction comes to https://www.researchgate.net/publication/220663968. It has been successfully used in diffuse and sharpen module to stiffen gradients in order to revert blur, or smoothen them to revert noise, and all the artistic purposes in-between.
It is again based on a second-order PDE, meaning that, in practice, we add a ratio of the laplacian over the initial image in an iterative fashion to diffuse. In diffuse and sharpen, to sharpen, we actually subtract a ratio of the laplacian. We also do it multi-scale in a wavelets setup.
We will use this here to smoothen high-frequencies at the valid/clipped boundary, but also to restore the curvature of clipped region from the neighbourhood, on low-frequencies (@Iain gradients extrapolation for texture synthesis gave me the idea yesterday, although this method should be less prone to overshoots and smoother).
Demodulation
Since designing the tone equalizer, I have noticed that the euclidean norm of an RGB vector always produces a smooth image even when one or several channels are clipped. I feel it could be an estimator of the maximum of likelihood or a metric of energy, I’m not sure. In any case, the euclidean norm is preserved across any linear map, so it also has the nice property of being invariant whatever the RGB color space we are in (… provided it’s white-balanced already).
By dividing the RGB channels by the euclidean norm, we get ratios that present an amazingly smooth depiction of the picture chromaticity, although disconnected from any perceptual representation of color. For example, for this image:
The demodulated ratios are:
This gives us an opportunity for heavy chrominance diffusion without loosing details. This principle is already used in filmic’s “high quality” reconstruction for color in-painting but has the undesirable property of propagating color from unrelated surfaces (witness clouds being inpainted with blue sky or green leaves depending which one is sitting next to them…).
Method
Test-demosaicing
- Demosaic the Bayer CFA into RGB by bilinear interpolation,
- Create a 4-channels image: R, G, B, euclidean norm(R, G, B),
- Create a boolean clipping mask for R, G, B and the norm:
- R, G, and B are set to 1 if they are clipped or if they were interpolated from a clipped pixel, else 0
- norm is set to 1 if any of the R, G, B channels are clipped, else 0,
- Blur the boolean clipping mask with a 5×5 box blur, so it’s not really boolean anymore (failure to do this leads to chromatic aberrations and overshoots in the reconstruction).
First iteration
- Wavelet-decompose the 4-channels image following the process, where Bspline|_i skips 2^i pixels between each element (see the “à-trous” wavelets paper above):
-
LF_0 = Bspline|_0 * in (convolve by the B-spline filter, aka blur)
- HF_0 = in - LF_0
- LF_1 = Bspline|_1 * LF_0
- HF_1 = LF_0 - LF_1
- …
- LF_n = Bspline|_n * LF_{n - 1}
- HF_n = LF_{n - 1} - LF_n
- Starting from coarse (HF_n) to fine (HF_0), for each pixel, at each scale i:
- Save a copy of HF_i(R, G, B, norm) = copy,
- Get the opacity of the clipping mask \alpha for all channels,
- Compute the isotropic bi-laplacian \Delta HF_i on all (R, G, B, norm) channels,
- Compute the anisotropic laplacian \Delta LF_i on the norm only,
-
HF_i(R, G, B) \gets HF_i(R, G, B) + \frac{\Delta HF_i}{2i^2} (diffusion)
-
HF_i(norm) \gets HF_i(norm) - \frac{\Delta HF_i}{2i^2} (texture enhancement)
-
LF_i(R, G, B) \gets LF_i(R, G, B) - \frac{\Delta LF_i}{2i^2} (texture synthesis)
- Compute the piece-wise stats in the (2^{i + 1} + 1)×(2^{i + 1} + 1) window centered on the current pixel,:
-
average(HF_i(R, G, B)),
- average(HF_i(norm))
- covariance(HF_i(R, G, B), HF_i(norm))
- variance(HF_i(R, G, B))
- Compute the guided-filter parameters a and b (formula above) of each R, G, B channel against the norm,
-
HF_i(R, G, B) \gets \alpha (a × HF_i(R, G, B) + b) + (\alpha - 1) × copy (guided filter)
-
HF_i(norm) \gets \alpha HF_i(norm) + (\alpha - 1) × copy (reconstruct the curvature of the guide for the next step)
- Resynthesis LF_{i - 1} = HF_i + LF_i and go to the finer scale,
- Iterate until i= 0
Second iteration
It’s mostly the same but this time, we divide each R, G, B channels by the euclidean norm before, wavelet-decompose the ratios and the norm, and remultiply ratios by the norm after.
The first difference is on the intensity of the diffusion step over the ratio (step 6.5 above): HF_i(R, G, B) \gets HF_i(R, G, B) + \Delta HF_i. The diffusion is harder.
The second difference is we don’t use the norm to guide the ratios, but we test for each pixel and scale which channel has the lowest intensity (by probing the LF intensity), and use this one as a guide.
Remosaicing
Easy-peasy.
Results
Ough. Too tired. Pull the branch and test it.
Why I love it
- We make zero assumption on color, we just copy-paste gradients from the valid data we have onto the channels that are clipped.
- We diffuse and try restoring by propagating not only gradients but also curvature with smooth operators that won’t overshoot. We do it multiscale to milk as much neighbourhood as we can.
- It doesn’t care about white balance at all. Right or wrong WB, it should behave the same.
- The theory is nice and clean. No black-magic heuristics or made-up goals to decide what a pixel value should be.
Why it’s not enough
It doesn’t care about color and may not tackle all the magenta stuff. It needs chromatic aberrations correction later and another desaturation step.
Where it sucks
- 2 wavelets decompositions is heavy and slow.
- It needs 2 user params for the guided filter featherings (details and color).