Denoising: How variance stabilization transform work

While I was working on noise profile creation for darktable, I had to learn how variance stabilization transform work.
So, here is a summary of what I learnt about this :slight_smile:

What are variance stabilization transform?
Noise variance is not constant: the variance depends on the light intensity.
Typically, variance is higher in bright areas than in dark areas.

Plenty of noise algorithms are designed to work on images that have constant variance.
To use these algorithms, we can transform the data so that variance becomes constant.
Such a transform is called a variance stabilizing transform.

Why and How does it works?
First thing to know is how variance evolves depending on the mean.
Let’s say we have empirically: V(X)=g(E[X])
We can try fitting a model to these empirical data.
For instance, for creating darktable’s noise profiles, we use the gaussian-poissonian model, the variance is assumed to evolve linearly with regards to the mean: V(X)=a*E[X]+b
So, we try to find a and b such that the model fits as much as possible the reality (in practice, in darktable, these a and b are exactly what your noise profiles are).
At this step, the “only” thing we have to do is to find a model that fits well the reality :slight_smile:

Then, we can design the transform.
Let f be our variance stabilization transform.
We want V(f(X))=c, and we know V(X)=g(E[X]).
Let’s write m = E[X].
Using Taylor expansion (see Taylor expansions for the moments of functions of random variables - Wikipedia), we get:
V(f(X)) \approx V(f(m)+f'(m)(X-m))
Which we can simplify using the properties of variance:
V(f(X)) \approx V(f'(m)(X-m))
V(f(X)) \approx f'(m)^2*V((X-m))
V(f(X)) \approx f'(m)^2*V(X)

So our condition V(f(X))=c gives us the following condition:
We know that V(X)=g(E[X]), so we get:
For all values of m such as g(m)\neq 0, we have:

We usually choose c=1:

From this, we can integrate to get a function that stabilize variance.
For instance, with the gaussian-poissonian model:
We can choose the following function:

If we transform our data by applying this function, we will get noise variance of 1.
We can then denoise the data using our favorite denoising algorithm, and perform an inverse transformation after that.

In practice
Finding a model that fits perfectly reality and that is easy to compute is not easy.
At least we can get as close as we can.

Here is an example of a fit using the gaussian-poissonian model (see how variance varies with the mean):

Here is the variance of data transformed:

Here is another example of a fit using a different model, where we assume V(X)=a*(E[X]+0.0005)^b:

Here is the variance of the data transformed:

Here, the second model allows to have a better variance stabilization in the dark areas, which can make denoising of very noisy images easier.

In darktable, what we currently have is the first model.
In Rawtherapee, while AFAIK there is no noise profiles, there is still a variance stabilization transform done: the gamma parameter in the noise reduction module allows the user to find the transform to stabilize the variance himself.
A gamma equal to 2 will give roughly the same results as the variance stabilization we currently use in darktable.
If you find the right gamma value, you will get something similar of what we have with the second model shown here.
Note that depending on the gamma value chosen, the variance will not always be stabilized, but it can be :wink:


Two newbie questions: 1 What are V(X) and E[X] exactly and 2 how did you graph them?

I don’t have a math or coding background so I learn by doing. If you could provide an image that I could try graphing, that would be fantastic.

V(X) is the variance (Variance - Wikipedia)
E[X] is the expected value (Expected value - Wikipedia)
X is the probability law which gives the probability of a pixel to have a particular brightness value.

You can compute E[X] and V(X) by taking a defocused shot I_n of a linear gradient (, and then compute a mean filter (or gaussian blur if you prefer) on this image to get a reference image I_r of what the image would be without noise.
Then, the expected value E[X] at each pixel is approximated by the pixel value on I_r.
The variance is computed by gathering all the pixels that have the same value in I_r, and computing the average of (I_n[pixel]-E[X])^2.

I used shots like these to create the graphs:
These shots are 2 shots of a BW linear gradient (you can find it here:, at +3EV and -3EV, to capture all the dynamic range of the camera.

If you don’t mind, I want to work through this with the two raw files that you have provided. As said, I learn by doing. Is there are tutorial that I can follow? If not, here is my first question: What raw processing should be done before applying the mean or Gaussian filter?

Second question: Is this correct? (Used Wolfram Alpha to assist me. :blush:)


If I substitute

I get
f(m)=\frac{- 2m-0.001}{(b-2)\sqrt{a(m+0.0005)^b}}

At this point, all I need is a and b.

There is not yet a tutorial. However, there is some code for computing the variance here: darktable/noiseprofiles.c at c7017c9d2484588e47bdee7cb547c184f95551f6 · darktable-org/darktable · GitHub

I will go through the full procedure, I hope it will be clearer :slight_smile:

First, from the raw files, you have to decide at which point of the pipeline of your software you want to get profiles. In darktable, we do that after demosaic, but prior to any other processing.
For profiling, you need to disable white balance, as white balance is equivalent to a channel dependent exposure: it changes the noise characteristics.

Then, once you have found where in the pipe you want your denoising, you have to get the data of your gradient images as it is in this point of the pipe.

From the data, we can compute the profile as follows:

  • we need to find the expected value first: we perform a mean filter on the image. Note that while I was using a classic mean filter at the time I wrote this post, I now use a mean filter which computes a mean only vertically, as the gradient is horizontal this is more accurate. The computed mean gives the expected value at each pixel.
  • we choose the precision we want on the x axis (how many points we want in our curves), i.e. we get a discrete set of points for which we will compute the variance. I used 250, 500 and 1000 and got similar results.
  • for each pixel, compute (pixel-m)^2, and accumulate the results of all pixels which have their means in the same point on our discrete set of point. For instance, if we choose 1000 points, a pixel for which m=0.0001 and a pixel for which m=0.00011 while contribute to the same variance.
  • this allows us to compute a variance for each of our point, knowing the sum of (pixel-m)^2 and the number of pixels that were used for this point

Now, we have a curve of variance in function of the mean.
We just have to fit a model to this curve, and get f from this :wink: What you did for this in your second question is perfectly correct :wink:

Note that the model V(X)=a*(m+0.0005)^b was one of my trials, I now use a more generic form V(X)=a*m^{2(d-1)}*(m^d+b)^p. But as the form gets more generic, the fitting part can become very empirical to optimize the parameters to find the best model possible (with default fitting of 4 parameters with gnuplot, I get results not good enough near 0). For this model, f(x)=2/(\sqrt(a)*d*(2-p))*(x^d+b)^{1-p/2}.
But you can start with a more simple model like V(X)=a*m^b, it seems to work quite nicely, and it is much easier to fit only 2 parameters than 4.

Also, for fitting, I exclude the highlight part of the curve for the green channel, and exclude even more the right part of the curve for red and blue channel

I hope I did not forget something, otherwise don’t hesitate to ask! :slight_smile:


thanks for moving away from the + 0000.5 :slight_smile: this seems to work okay in this one example but also shows overshoots the other direction here.

and yes, introducing more parameters can make the fitting unstable, especially because we have so many broken values in the data (due to sparse histograms, edges in the image, overexposure…).

another thing to consider is that this non-linear mapping does in general not commute with denoising. which is why there is a different function to undo the curve instead of just inverting the function (can you still invert your complicated model?). this “unbiased inverse” may be hard to derive for different noise models.

in our plots with the mismatch at 0 i often have this feeling that it may be an issue of black point detection, and that our results might be better if we were matching the model to rawest possible data instead (but have no data to back this up, just a thought).

Hi Johannes! :slight_smile:

The 0.0005 was mainly for preliminary testings :slight_smile:

I agree for the fact that it will be hard to find an unbiased inverse transform. I think the easiest may be to derive it empirically from profiling data.
To derive the unbiased inverse transform closed form, if I remember correctly, it seems that the authors computed all possible values of E[f(X)] and fitted a model to that to inverse it back to E[X].
We could try to do the same from our empirical data (we cannot have theoretical data here, as we move away from the gaussian-poissonian model I have no analytical expression of probabilities here): we can compute E[f(X)] and E[X], so it should be possible.
At least, I will try :slight_smile:

I really think the mismatch at zero is only due to the model not being close enough to reality: as we clamp negative data to 0, I think variance should be lower than expected, not higher.
In addition, for low ISOs values, we can see the model is not stabilizing variance at all.
See the figure on the left bellow: the slope is far from being 1.
That’s why I think the gaussian-poissonian model does not always fit reality. It seems to be a good fit only for medium ISO.

This was this bad stabilization clearly visible at low ISO that guided me to consider more generic model.
With the last model (on the right) we can see that variance is better stabilized.

Edit: as a note, for some denoising algorithms we don’t really need an inverse transform. For instance with nlmeans or bilateral filter, we can use the transformed image as a guide to perform a mean on the original image.

I have been trying something different to account for the different levels of noise across the image for my GMIC noise reduction filters.

  1. Do small radius median filter
  2. Find the absolute difference between that and the original
  3. Use the median filtered image as a guide for a bilateral filter on the difference image (this gives you an image representing the local noise level)
  4. normalise the original image based on the local noise level
  5. do denoising
  6. reverse the normalisation.
1 Like

At step 3, how do you choose the radius of the bilateral filter?
Could you also explain how your normalization works?
Thanks :slight_smile:

I’ll have to go back and look at what I did exactly tomorrow, but from memory the bilateral radius was quite big, something like 20 pixels. If it is too small then you are not getting enough samples to get an accurate noise level.

Normalisation is multiplying the original image by 1/noise_level
Although I think I used something like 10/noise_level because it resulted in a more convenient range of values.

Edit: this is all found out through trial and error so it might not be the best method

1 Like

i think this approach sounds really good. similar to the haar-fisz transform in a way. you get the average brightness from the coarse, and normalise variance by scaling the detail coefficients. i’m not so worried about also getting the noise estimate in the process, we can get pretty accurate values up front/by profiling. in the wavelet context you would of course directly do the denoising on the wavelet.

1 Like

does that mean we have to redo all profiles?:slight_smile:

… i was more hoping we could derive the noise parameters on the fly in the future. maybe with optional profiling or cross-image validation or so. but yes, denoising on raw data before black point subtraction would invalidate our current profiles.

I’ve had a look at what I did. I didn’t remember it exactly right.

  1. Run a Laplacian filter and get the absolute value
  2. Median filter the original image to create a denoised guide for the bilateral step.
  3. Use the median filtered image as a guide for a bilateral filter on the Laplacian image (this gives you an image representing the local noise level)
  4. normalise the original image based on the local noise level
  5. do denoising
  6. reverse the normalisation.

Here is the original image

here is the noise estimate (normalised to fit 0 to 255). Bilateral filter radius 22.

Here is the original image adjusted to even out the noise levels

1 Like

Thanks for your explainations, I understand now :slight_smile:

wait you ran that on the gamma corrected images? white means a lot of noise and dark means not much noise, right? it seems it’s doing exactly the opposite of what the gaussian/poissonian model would predict… i can only explain that by gamma encoding which squeezes together the bright values.

I have seen some interesting approach for that using RGB channels correlation (texture is correlated, noise is not: ). But I don’t know how much it is practical in term of speed and accuracy. There is also the approach of blind denoising used by dxo (, I did not look yet at how it works.
Meanwhile, I think good noise profiles (not necessarily before black point substraction) would already ease a lot denoising. Current profiles’ model is not generic enough I think, and even on ISO values were the model is good, the profiles from current noiseprofiling tool do not stabilize variance to 1. So well, the good news is that even with these profiles, denoising was already quite nice, and there is plenty of room for improvements to get even better denoising :slight_smile:

Yes. White is lot’s of noise and black is not much. I wanted to something for general noise reduction with images from unknown sources.

Here’s the noise level for a linear image.

thanks iain, this one looks way more familiar!

using the correlation between RGB sounds interesting!

you know that raw images often have a “black stripe”? parts of the sensor are not exposed to light but are just there to estimate the black point and the gaussian noise level. considering that our model so far is a simple linear model for variances, all that is left to do for us is to estimate a single other noise level for any brightness > 0. i suppose extracting a hand full and reject those that seem unreliable might be better… but considering this i think the problem of noise estimation should be really easy to tackle. variance stabilisation would be the next step then.

1 Like