Survey on the linear workflow

While your last paragraph is absolutely and irrefutably correct, there is room for a right and wrong here. Some operations can and are being performed on gamma corrected data, but that does not take away the fact that theoretically they actually definitely should not be. There are multiple sources that clearly explain and show that things like blurring and resizing should be done on linear data. Please see here and here for example.

Edit: just to add, as @aurelienpierre would probably reiterate too, there is physics and mathematics (en electronics) underlying digital photography. There is little sense to abandon or ignore that, at least, for me.

Edit 2: I see I got some double negatives mixed up. Tried to rephrase.

At least from a quick read of your two links, section D1 of the second one:

“Whether you get “prettier” results when using a gamma=1.0 or a gamma=2.2 RGB color space is an entirely subjective call, and in my opinion, the artist is always right.”

That doesn’t look like a “clear explanation” that blurring and resizing should always be done on linear data to me.

Edit: Should the “technically incorrect” approach be the default approach used? No. But should it be completely forbidden for anyone to ever use that approach? Also, I firmly believe, no.

1 Like

For clarity then please also quote the first part of section D1 from @Elle’s article:

“In all comparisons above, the colors in the images on the left, edited in the linear gamma version of the sRGB color space, are technically correct. The colors in the images on the right, edited in the regular sRGB color space are technically wrong (…).”

There’s always a distinction between technically correct and artistically pleasing. :slight_smile:

On the note of processing in linear vs gamma compressed, I really do not have much to add to this discussion, as I am not a developer, except for a specific preference for linear for generating sharpening masks, such as Rawtherapee’s new-ish contrast threshold feature.

Regardless if the actual sharpening is done in linear vs gamma compressed (don’t have an educated opinion on that debate), I think that any sharpening mask should be created in the linear space, since the signal to noise ratio tends to be poorer in the shadows, and lots more fine detail is above the noise floor in the highlights, when observed as a gamma encoded image. This observation is supported by the discussion of photon shot noise linked here. http://www.photonstophotos.net/Emil%20Martinec/noise.html

Given this correlation, then the S/N ratio should be less variant across the tonal range in linear color space, so setting a minimum threshold for sharpening would be able to be set just above the noise level (optimum setting) for a greater tonal range of the image, compared to where I have to do now, where I set the threshold tuned for midtones, and faint detail in the highlights doesn’t get sharpened, and higher nose in the shadows gets sharpened.

Edit INB4 someone responds “Just export two versions with different threshold values and blend with luminance mask” . That would be a waste of time, and raw processing would be more intuitive and quick and I could just tune S/N ratio for one patch with noise and detail in the midtones and assume I won’t make the highlights look waxy, or the shadows noise be amplified.

Edit INB4 someone responds “Just export two versions with different threshold values and blend with luminance mask” . That would be a waste of time, and raw processing would be more intuitive and quick and I could just tune S/N ratio for one patch with noise and detail in the midtones and assume I won’t make the highlights look waxy, or the shadows noise be amplified.

Isn’t this really the right answer though? (That, is the raw converter should be able to tune the sharpening threshold to scale with luminosity. I agree doing it by hand is annoying.) The shadows will always be noisier than the highlights just by the virtue of how the physics works out. In a linear space, if a pixel collects N photons, it’s going to have SNR proportional to N^{1/2}. Highlights have many stops (say 2^6 photons, at least) more input than the shadows. So they are guaranteed dramatically less SNR.

Further, I’m not sure what doing it in a linear space gets you compared to a gamma-corrected one. The gamma correction is a monotonic map of N and hence of the noise levels, so for any threshold in linear space you should be able to find another one in gamma-corrected space that gives similar results.

This is all heuristic, so maybe I’m missing something? I’d be happy to write out something mathematically more rigorous if you want.

Actually, now that I think of it, you do have a point. With a sufficiently wide ranged gamma correction for the threshold weighting, whether the mask is being created in linear or gamma corrected space becomes irrelevant. My initial gut sense was that linear space would have a more even amount of noise across the tonal spectrum, but after doing some mental thought experiments of noise SNR of “SNR proportional to N^{1/2}”, echoed on the Martinec article, realized that the range of values in a linear perspective from a small number of photons would be less than that of a large number of photons, even though it is the other way around from a proportional perspective. Therefore, creating an edge mask in a linear space would target the shadows too much. This is confirmed by my experience of gamma settings of 1.0 for RT noise reduction targeting the shadows too much. Then there is variables such as read noise, and it becomes pretty apparent that there is no one size fits all choice, and that a user defined weighting curve is always going to be likely necessary.

I think a good general point is surfacing here, in that preserving the energy relationship has more value in some operations than others. Endeavors that are concerned with color and tonality definitely have a concern with the energy relationship, endeavors that are concerned with edges not so much. Indeed, certain tone mapping may facilitate edge detection in things like convolution kernels, where its the difference in adjacent values that facilitates the transform.

Having read this thread with high interest, I have a need to post my thoughts as well.
 

I’m not a programmer myself, but an avid user of raw converters. I came here being an unhappy user of Lightroom (quality wise).
 

To begin with, I would like to state two concepts already mentioned, but to my knowledge not properly used sometimes:

  • radiometrically correct: in a picture means that the pixel values in the demosaiced image are the same or at least proportional to those present in the photosites of the camera sensor. E.g., if a photosite has received 4 times the signal (photons) than its neighbours, the demosaiced image values must respect such proportion, that is, 100 vs 400 or 0,211 vs 0,844.
  • linear data values: mostly referred as linear gamma. If the sensor receives a given amount of signal that is stored as value 20, it has to receive double the signal to be stored as value 40. To me that is linearity. That is how sensors work. We are not taking here (yet) about black point compensation, white balance, clipping or anything else. Just value x has received half the light as value 2x. Likewise, in a demosaiced linear image, a value that doubles another one, must double its intensity (hue, lightness, …).

The problem is that our eyes doesn’t work that way, so we have to play tricks to show the sensor data in a way that our eyes understand or are pleased to watch. In this sense, a gamma encoded image doesn’t respect the sensor linearity (obviously, as it has been designed for that), and a value that doubles another one doesn’t have double its intensity.
 

Now to the problem of a linear workflow: the data within the pipeline should always be in a linear data fashion, same as it was captured by the sensor.
 

It’s easy to find webpages telling that the sensor captures light in a linear way:

Now we can go to Elle Stone’s website and read about linearity:

And just a note about if we even need gamma correction at all, in an era of 32 bits/channel, floating point precission raw engines:

But now let’s see something that could lead to missconceptions: if you linearly modify the values of every pixel in the demosaiced image, you will end up with a modified linear gamma image. You’re not gamma encoding the image, but just changing its pixel values. You started in linear gamma and ended in linear gamma (with modified values).
 

Again, this is not how our eyes work, so now the developers have to carefully think if a tool works better in a linear fashion (doubling intensities when doubling values), or in a gamma version of the image (more perceptually uniform). Or even if the tool has to be processed before or after its current position in the pipeline.

I won’t dare to say that such decission is easy, or that coding the tools is a piece of cake. It’s just that developers are the only ones with the power to decide how to tweak an image (I’m not talking about setting the sliders, but about the algorithm itself), so they have to carefully weight whether it should be done linearly or non-linearly, to prevent artifacts.
 

In fact, I think that working with tools that need gamma converted values is pretty simple: you get the linear data from the pipeline, gamma encode it, modify that data with the tool, and send back the resulting values after decoding them to linear gamma. All in all is just a question of converting a value to the x power, and then convert the tool_modified_value to the 1/x power (in its simpler scenario).

Obviously the returned modified-but-linear value won’t ever be radiometrically correct, but to my knowledge that’s not the purpose of a raw processing app. I seek a beautiful image that is the essence of what I felt when I was taking it, not an image that exactly, clinically depicts what was present the moment I made click. I only need a radiometrically correct translation of the image at the beginning of the process (maybe just after demosaicing).
 

If we don’t send back to the pipeline the modified values in a linear gamma (but not radiometrically correct anymore), it is possible that the next tool in the pipeline maybe works better with linear data, but it receives indeed gamma corrected data, thus producing artifacts.

If we go on and keep mixing tools that need gamma converted data with tools that work better with linear data, and one tool sends its results to the next, what we end up is with more and more artifacts (not desirable hues, halos, strange luminances, …).
 

Hope all of this makes sense, because if FOSS apps get to work like that, they will be waaaaay ahead of commercial products.

2 Likes

Yup. As an FYI, the exposure fusion approach used by Google as part of their HDR+ implementation behaves this way (or at least Tim Brooks’ implementation, along with my rework of darktable’s exposure fusion) - Data is returned to linear such that any module which might be later in the pipeline (while I agree that fusion should be near the end, I’ve found that it’s often visually pleasing to follow it with a camera emulation tonecurve. Darktable calls this “basecurve” and the intent was to start with camera-JPEG-like data, and I not only fully understand the criticisms of this approach and mostly agree with it. In all of my own workflows, if “basecurve” is present, it’s moved to the end of the pipeline and serves to give the picture a “look” similar to how the camera behaves as the very last operation. Thus it is no longer a “base” - perhaps a better name would be “camera look emulation”.)

In many cases, such a “look” does involve chromaticity shifts - which happens to be the origin of the infamous “Sony has horrible skin tones” debate because their tone curve and non-Caucasian skins don’t seem to mix well… But in many other situations, those chromaticity shifts look more pleasing. Sunsets are a perfect example of this - they look MUCH nicer to my eye at least than one which has had chromaticity preserved.

Oh yes, unless it is explicitly a colorspace transform step (darktable’s colorin/colorout for example), if a module needs a different internal colorspace for its operation than the work profile, it needs to ensure that data is returned to the work colorspace before it’s done.

Some have asserted that if a module needs such a colorspace conversion internally, it is fundamentally broken. I disagree with that.

Yup, which is why I stated the above - if you work internally in some colorspace that isn’t what you had come in, you had better return your data to that colorspace for consistency. Darktable has a concept called a “work profile” to handle this. Most modules (there are some exceptions, like whitebalance and demosaic) should automatically convert from the work profile to their internal needs, and convert back when they’re done. Converting input without converting back when you’re finished would be a horrible thing to do.

There is also the argument that certain operations should, by default, be earlier or later in the pipeline with others. I’m fully on board with this too - most of the work I’ve been doing is on stuff that has always been at the end of my pipelines and I fully support that you should only place it somewhere earlier if you really have a specific conscious reason to do so. (There are potentially artistic reasons for doing things the “wrong” way, but one should provide the user with sane defaults but give them plenty of rope - they may hang themselves, or they may create an amazing climbing net or artistic installation with that rope. Don’t preclude the artistic installation because the user might instead hang themselves.)

1 Like

Very informative thread, I have some question

When a value is subtracted for the black point corrention this changes the white point too, while maybe not radiometrically correct what happen if we use something like the levels tool in gimp so the white point remain the same?

Where in the pipeline is generally performed the black level substraction? Is it performed in the camera color space or after the input color matrix ?

In my rawproc workflow, I put it as the very first operation that transforms the input raw data. Since all rawproc tools are ordered by the user, I can put it most anywhere, and your inquiry has prompted me to consider moving it around to see what happens…

Generally, I use dcraw as my sequencing reference. Yes, the specifics of operations can be hard to decipher, but David Coffin has been good to put them in functions whose calling sequence can be readily determined.

2 Likes

Currently, I don’t touch black levels because it affects colour balance as @age said above. I would also like a full explanation on it.


PS Thought I might drop these posts into this thread as they are relevant to our discussion.



PPS I forgot to thank the above contributors for reminding me that I have been using the term linear loosely. Yes, where possible, I would like the conversation to tend toward radiometric integrity. Thanks!

How does subtracting a value on all channels change the ratio’s between captured photons?

EDIT: Okay, so maybe I was a little stupid here. If you capture 5:10:15 photons (RGB) and subtract 4 you end up with a net signal of 1:6:11. That’s clearly not the same as the original 1:2:3 balance. For a larger number of captured photons, the relative differences get somewhat smaller, e.g. 1000:2000:3000, subtract 200, gives 800:1800:2800 which is roughly 1:2,25:3,5.

1 Like

The theory behind black-frame subtraction (which might simplify to subtracting a constant value from all pixels) is that the recorded pixel value is proportional to the received intensity, plus some garbage noise. So we subtract the garbage noise, and the result is proportional to the received intensity.

One point to beware of: if the camera has calculated and recorded a white balance, these are multipliers for the channels, and those numbers won’t be accurate for the de-noised signal. The difference may be too small to worry about.

Yes, the logical answer is in the camera space, before de-mosaicing or conversion to XYZ or sRGB or anything else. The reason is that the camera values are assumed wrong, so should be corrected before the wrongness is propagated to other pixels.

EDIT: Above, I have over-simplified. Some noise may be “shot” noise (see https://en.wikipedia.org/wiki/Shot_noise ) which is due to random variation in the intensity of light. The camera correctly records this variation. If we remove shot noise, we no longer have values proportional to the actual received intensity. But the result is proportional to what an ideal camera would have captured from ideal light arriving at the sensor.

I guess a way to think of it the other way, as to WHY you need to do black level subtraction:

The black level is an aspect of the camera’s photon capture implementation. As I mentioned earlier, it’s effectively an offset that the camera adds to its recorded signal. (One reason why is so that internally generated read noise is evenly distributed and recorded around the black level, as opposed to recording the absolute value of the noise).

A common black level value is 512. For simplicity’s sake, let’s assume that 1 ADC count = 1 photon

So the camera’s output is 512 + nphotons

So if you have 0 photons, you record 512
If you have 512 photons, you record 1024
If you have 2 photons, you record 514

Let’s imagine a pixel that only saw 2 red photons
So your counts are 514/512/512 - if you don’t do black level subtraction, you are assuming the pixel is an almost completely unsaturated grey. If you do black level subtraction, you get reality, which is a maximum-saturation very dark red.

The actual ADC output from a camera will actually look something like:

val = ADCOffset + C_{leakage}*leakagecurrent + noise + C_{photon}*nphotons

ADCoffset = a fixed aspect of the camera’s design, which is commonly called the black level
Leakage current (and its associated constant) - this is what contributes to hot pixels on long exposures. It’s typically, for any given pixel, constant for a given temperate and exposure time. That property is why dark frame subtraction works
Not much you can do about read noise with a single exposure. If you have multiple exposures, if you average them before black level subtraction (or allow negative values after black level subtraction), the read noise will average out

As @snibgo mentioned, for low numbers of nphotons, there’s photon shot noise in here too. Nowadays that is typically dominating well over read noise at high ISOs for most modern cameras.

Note that for every camera I’m aware of, the white balance multipliers are designed to be applied AFTER black level subtraction. As far as hot pixel correction via dark frame subtraction - whether this has any effect on white balance depends on how much, if any, that hot pixel threw off the original white balance calculation

2 Likes

Ah, okay, thanks for the correction.

@Entropy512
The formula you give above is not correct. Read noise does not add any signal. It only smears out the value of the ADCoffset.

Yes, the offset is added in CCD-cameras to avoid negative voltage at the ADC, in other words, it ensures that the noise is properly sampled. Since we are dealing with unsigned integers, an offset of 0 would not allow to correctly represent the noise.

Why do you introduce a new term “leakage” and not call it dark current?

It is not only the hot pixels. Every pixel shows a dark current. Only in hot pixels this is much larger than average.

Another issue is linearity. Without subtracting the ADCoffset (what we call “bias” in astronomy) from the data, the signal is not linear! Other operations, like flatfielding, assume a linear signal!

Hermann-Josef

So I was thinking about that yesterday, and wondered how the following would be accommodated: A variety of cameras allow one to manually measure the white balance at the scene. Both my D7000 and Z6 have this, and to use it you select one of the WB presets instead of auto or whatever, then with the Z6 I have the measure tool connected to a Fn button, so I aim the camera at a neutral place, press and hold the Fn button, after a second the camera blats a “Measure” prompt in the viewfinder and then I press the shutter button. Instead of taking a picture, the camera collects the patch around where I aimed at, uses the patch average to compute WB multipliers, and stores them as the preset I’ve already selected.

@Entropy512, if what you describe is true, then the camera should be subtracting the black level from the patch before it conjures the WB multpliers… ?? I might have to test this, re-order blacksubtract and WB, see what I get…

Thanks for the clarification - I edited and simplified it to “noise” as thermal noise (which may turn out to be insignificant) is Gaussian-distributed.

Either way - the goal is that certain noise sources are properly sampled instead of being clipped or mirrored around 0.

Also, many CMOS sensors (not just CCD) also have such an ADC offset. Like every Sony I’ve worked with in the past 5-6 years.

Yes, it’s almost surely subtracting the black offset internally before it calculates the multipliers.

Otherwise you’d get some really funky shifts if the “white” reference were more like a dim grey.

@Entropy512

This is an electronic/mathematical issue not connected to the type of detector.

Still it does not add any signal to the data value like in your formula. Noise only spreads out a signal but does not add a signal by itself. Dark current, however, does add to the signal as your formula specifies.

Noise sources are: read noise, photon (shot) noise, noise due to dark current (again shot noise, except for hot pixels), incomplete flat field correction (i.e. remaining fixed pattern noise) and digitization noise (which should be negligible). I hope I did not forget one :slight_smile: .

Please note my additional remark about linearity above.

Hermann-Josef