Survey on the linear workflow

XavAL · August 17, 2019, 3:13pm

Having read this thread with high interest, I have a need to post my thoughts as well.

I’m not a programmer myself, but an avid user of raw converters. I came here being an unhappy user of Lightroom (quality wise).

To begin with, I would like to state two concepts already mentioned, but to my knowledge not properly used sometimes:

radiometrically correct: in a picture means that the pixel values in the demosaiced image are the same or at least proportional to those present in the photosites of the camera sensor. E.g., if a photosite has received 4 times the signal (photons) than its neighbours, the demosaiced image values must respect such proportion, that is, 100 vs 400 or 0,211 vs 0,844.
linear data values: mostly referred as linear gamma. If the sensor receives a given amount of signal that is stored as value 20, it has to receive double the signal to be stored as value 40. To me that is linearity. That is how sensors work. We are not taking here (yet) about black point compensation, white balance, clipping or anything else. Just value x has received half the light as value 2x. Likewise, in a demosaiced linear image, a value that doubles another one, must double its intensity (hue, lightness, …).

The problem is that our eyes doesn’t work that way, so we have to play tricks to show the sensor data in a way that our eyes understand or are pleased to watch. In this sense, a gamma encoded image doesn’t respect the sensor linearity (obviously, as it has been designed for that), and a value that doubles another one doesn’t have double its intensity.

Now to the problem of a linear workflow: the data within the pipeline should always be in a linear data fashion, same as it was captured by the sensor.

It’s easy to find webpages telling that the sensor captures light in a linear way:

Digital Camera Linearity (University of Toledo, OH)
DPreview, about dynamic range, but showing sensor linearity
Cambridge in Colour, about Gamma Correction. I want to make an observation: where the text says «When a digital image is saved, it’s therefore “gamma encoded”» it’s right, it’s gamma encoded, but not in the sense they explain it following that sentence. The image is saved gamma encoded, either with a linear gamma or a non-linear gamma.

Now we can go to Elle Stone’s website and read about linearity:

Blending in non-linear-gamma-encoded vs linear data. Real images show the problems that may arise when working with non-linear gamma encoded data. (Link already posted by @Thanatomanic).
Is your image editor using an internal linear gamma color space?. Discuss whether the tools must work with linear or non-linear gamma encoded data.

And just a note about if we even need gamma correction at all, in an era of 32 bits/channel, floating point precission raw engines:

Published hardcover book on Google Books, about how floating point is better than gamma correction for HDR images

But now let’s see something that could lead to missconceptions: if you linearly modify the values of every pixel in the demosaiced image, you will end up with a modified linear gamma image. You’re not gamma encoding the image, but just changing its pixel values. You started in linear gamma and ended in linear gamma (with modified values).

Again, this is not how our eyes work, so now the developers have to carefully think if a tool works better in a linear fashion (doubling intensities when doubling values), or in a gamma version of the image (more perceptually uniform). Or even if the tool has to be processed before or after its current position in the pipeline.

I won’t dare to say that such decission is easy, or that coding the tools is a piece of cake. It’s just that developers are the only ones with the power to decide how to tweak an image (I’m not talking about setting the sliders, but about the algorithm itself), so they have to carefully weight whether it should be done linearly or non-linearly, to prevent artifacts.

In fact, I think that working with tools that need gamma converted values is pretty simple: you get the linear data from the pipeline, gamma encode it, modify that data with the tool, and send back the resulting values after decoding them to linear gamma. All in all is just a question of converting a value to the x power, and then convert the tool_modified_value to the 1/x power (in its simpler scenario).

Obviously the returned modified-but-linear value won’t ever be radiometrically correct, but to my knowledge that’s not the purpose of a raw processing app. I seek a beautiful image that is the essence of what I felt when I was taking it, not an image that exactly, clinically depicts what was present the moment I made click. I only need a radiometrically correct translation of the image at the beginning of the process (maybe just after demosaicing).

If we don’t send back to the pipeline the modified values in a linear gamma (but not radiometrically correct anymore), it is possible that the next tool in the pipeline maybe works better with linear data, but it receives indeed gamma corrected data, thus producing artifacts.

If we go on and keep mixing tools that need gamma converted data with tools that work better with linear data, and one tool sends its results to the next, what we end up is with more and more artifacts (not desirable hues, halos, strange luminances, …).

Hope all of this makes sense, because if FOSS apps get to work like that, they will be waaaaay ahead of commercial products.

Entropy512 · August 19, 2019, 6:17pm

XavAL:

In fact, I think that working with tools that need gamma converted values is pretty simple: you get the linear data from the pipeline, gamma encode it, modify that data with the tool, and send back the resulting values after decoding them to linear gamma. All in all is just a question of converting a value to the x power, and then convert the tool_modified_value to the 1/x power (in its simpler scenario).

Obviously the returned modified-but-linear value won’t ever be radiometrically correct, but to my knowledge that’s not the purpose of a raw processing app. I seek a beautiful image that is the essence of what I felt when I was taking it, not an image that exactly, clinically depicts what was present the moment I made click. I only need a radiometrically correct translation of the image at the beginning of the process (maybe just after demosaicing).

Yup. As an FYI, the exposure fusion approach used by Google as part of their HDR+ implementation behaves this way (or at least Tim Brooks’ implementation, along with my rework of darktable’s exposure fusion) - Data is returned to linear such that any module which might be later in the pipeline (while I agree that fusion should be near the end, I’ve found that it’s often visually pleasing to follow it with a camera emulation tonecurve. Darktable calls this “basecurve” and the intent was to start with camera-JPEG-like data, and I not only fully understand the criticisms of this approach and mostly agree with it. In all of my own workflows, if “basecurve” is present, it’s moved to the end of the pipeline and serves to give the picture a “look” similar to how the camera behaves as the very last operation. Thus it is no longer a “base” - perhaps a better name would be “camera look emulation”.)

In many cases, such a “look” does involve chromaticity shifts - which happens to be the origin of the infamous “Sony has horrible skin tones” debate because their tone curve and non-Caucasian skins don’t seem to mix well… But in many other situations, those chromaticity shifts look more pleasing. Sunsets are a perfect example of this - they look MUCH nicer to my eye at least than one which has had chromaticity preserved.

Oh yes, unless it is explicitly a colorspace transform step (darktable’s colorin/colorout for example), if a module needs a different internal colorspace for its operation than the work profile, it needs to ensure that data is returned to the work colorspace before it’s done.

Some have asserted that if a module needs such a colorspace conversion internally, it is fundamentally broken. I disagree with that.

Yup, which is why I stated the above - if you work internally in some colorspace that isn’t what you had come in, you had better return your data to that colorspace for consistency. Darktable has a concept called a “work profile” to handle this. Most modules (there are some exceptions, like whitebalance and demosaic) should automatically convert from the work profile to their internal needs, and convert back when they’re done. Converting input without converting back when you’re finished would be a horrible thing to do.

There is also the argument that certain operations should, by default, be earlier or later in the pipeline with others. I’m fully on board with this too - most of the work I’ve been doing is on stuff that has always been at the end of my pipelines and I fully support that you should only place it somewhere earlier if you really have a specific conscious reason to do so. (There are potentially artistic reasons for doing things the “wrong” way, but one should provide the user with sane defaults but give them plenty of rope - they may hang themselves, or they may create an amazing climbing net or artistic installation with that rope. Don’t preclude the artistic installation because the user might instead hang themselves.)

age · August 23, 2019, 3:08pm

Very informative thread, I have some question

When a value is subtracted for the black point corrention this changes the white point too, while maybe not radiometrically correct what happen if we use something like the levels tool in gimp so the white point remain the same?

Where in the pipeline is generally performed the black level substraction? Is it performed in the camera color space or after the input color matrix ?

ggbutcher · August 23, 2019, 3:23pm

In my rawproc workflow, I put it as the very first operation that transforms the input raw data. Since all rawproc tools are ordered by the user, I can put it most anywhere, and your inquiry has prompted me to consider moving it around to see what happens…

Generally, I use dcraw as my sequencing reference. Yes, the specifics of operations can be hard to decipher, but David Coffin has been good to put them in functions whose calling sequence can be readily determined.

afre · August 23, 2019, 11:21pm

Currently, I don’t touch black levels because it affects colour balance as @age said above. I would also like a full explanation on it.

PS Thought I might drop these posts into this thread as they are relevant to our discussion.

PPS I forgot to thank the above contributors for reminding me that I have been using the term linear loosely. Yes, where possible, I would like the conversation to tend toward radiometric integrity. Thanks!

Thanatomanic · August 24, 2019, 7:43am

How does subtracting a value on all channels change the ratio’s between captured photons?

EDIT: Okay, so maybe I was a little stupid here. If you capture 5:10:15 photons (RGB) and subtract 4 you end up with a net signal of 1:6:11. That’s clearly not the same as the original 1:2:3 balance. For a larger number of captured photons, the relative differences get somewhat smaller, e.g. 1000:2000:3000, subtract 200, gives 800:1800:2800 which is roughly 1:2,25:3,5.

snibgo · August 24, 2019, 1:06pm

The theory behind black-frame subtraction (which might simplify to subtracting a constant value from all pixels) is that the recorded pixel value is proportional to the received intensity, plus some garbage noise. So we subtract the garbage noise, and the result is proportional to the received intensity.

One point to beware of: if the camera has calculated and recorded a white balance, these are multipliers for the channels, and those numbers won’t be accurate for the de-noised signal. The difference may be too small to worry about.

Yes, the logical answer is in the camera space, before de-mosaicing or conversion to XYZ or sRGB or anything else. The reason is that the camera values are assumed wrong, so should be corrected before the wrongness is propagated to other pixels.

EDIT: Above, I have over-simplified. Some noise may be “shot” noise (see Shot noise - Wikipedia ) which is due to random variation in the intensity of light. The camera correctly records this variation. If we remove shot noise, we no longer have values proportional to the actual received intensity. But the result is proportional to what an ideal camera would have captured from ideal light arriving at the sensor.

Entropy512 · August 24, 2019, 3:08pm

I guess a way to think of it the other way, as to WHY you need to do black level subtraction:

The black level is an aspect of the camera’s photon capture implementation. As I mentioned earlier, it’s effectively an offset that the camera adds to its recorded signal. (One reason why is so that internally generated read noise is evenly distributed and recorded around the black level, as opposed to recording the absolute value of the noise).

A common black level value is 512. For simplicity’s sake, let’s assume that 1 ADC count = 1 photon

So the camera’s output is 512 + nphotons

So if you have 0 photons, you record 512
If you have 512 photons, you record 1024
If you have 2 photons, you record 514

Let’s imagine a pixel that only saw 2 red photons
So your counts are 514/512/512 - if you don’t do black level subtraction, you are assuming the pixel is an almost completely unsaturated grey. If you do black level subtraction, you get reality, which is a maximum-saturation very dark red.

The actual ADC output from a camera will actually look something like:

val = ADCOffset + C_{leakage}*leakagecurrent + noise + C_{photon}*nphotons

ADCoffset = a fixed aspect of the camera’s design, which is commonly called the black level
Leakage current (and its associated constant) - this is what contributes to hot pixels on long exposures. It’s typically, for any given pixel, constant for a given temperate and exposure time. That property is why dark frame subtraction works
Not much you can do about read noise with a single exposure. If you have multiple exposures, if you average them before black level subtraction (or allow negative values after black level subtraction), the read noise will average out

As @snibgo mentioned, for low numbers of nphotons, there’s photon shot noise in here too. Nowadays that is typically dominating well over read noise at high ISOs for most modern cameras.

Note that for every camera I’m aware of, the white balance multipliers are designed to be applied AFTER black level subtraction. As far as hot pixel correction via dark frame subtraction - whether this has any effect on white balance depends on how much, if any, that hot pixel threw off the original white balance calculation

snibgo · August 24, 2019, 3:34pm

Ah, okay, thanks for the correction.

Jossie · August 24, 2019, 5:10pm

@Entropy512
The formula you give above is not correct. Read noise does not add any signal. It only smears out the value of the ADCoffset.

Yes, the offset is added in CCD-cameras to avoid negative voltage at the ADC, in other words, it ensures that the noise is properly sampled. Since we are dealing with unsigned integers, an offset of 0 would not allow to correctly represent the noise.

Why do you introduce a new term “leakage” and not call it dark current?

It is not only the hot pixels. Every pixel shows a dark current. Only in hot pixels this is much larger than average.

Another issue is linearity. Without subtracting the ADCoffset (what we call “bias” in astronomy) from the data, the signal is not linear! Other operations, like flatfielding, assume a linear signal!

Hermann-Josef

ggbutcher · August 24, 2019, 5:11pm

So I was thinking about that yesterday, and wondered how the following would be accommodated: A variety of cameras allow one to manually measure the white balance at the scene. Both my D7000 and Z6 have this, and to use it you select one of the WB presets instead of auto or whatever, then with the Z6 I have the measure tool connected to a Fn button, so I aim the camera at a neutral place, press and hold the Fn button, after a second the camera blats a “Measure” prompt in the viewfinder and then I press the shutter button. Instead of taking a picture, the camera collects the patch around where I aimed at, uses the patch average to compute WB multipliers, and stores them as the preset I’ve already selected.

@Entropy512, if what you describe is true, then the camera should be subtracting the black level from the patch before it conjures the WB multpliers… ?? I might have to test this, re-order blacksubtract and WB, see what I get…

Entropy512 · August 24, 2019, 5:18pm

Thanks for the clarification - I edited and simplified it to “noise” as thermal noise (which may turn out to be insignificant) is Gaussian-distributed.

Either way - the goal is that certain noise sources are properly sampled instead of being clipped or mirrored around 0.

Also, many CMOS sensors (not just CCD) also have such an ADC offset. Like every Sony I’ve worked with in the past 5-6 years.

Yes, it’s almost surely subtracting the black offset internally before it calculates the multipliers.

Otherwise you’d get some really funky shifts if the “white” reference were more like a dim grey.

Jossie · August 24, 2019, 5:25pm

@Entropy512

This is an electronic/mathematical issue not connected to the type of detector.

Still it does not add any signal to the data value like in your formula. Noise only spreads out a signal but does not add a signal by itself. Dark current, however, does add to the signal as your formula specifies.

Noise sources are: read noise, photon (shot) noise, noise due to dark current (again shot noise, except for hot pixels), incomplete flat field correction (i.e. remaining fixed pattern noise) and digitization noise (which should be negligible). I hope I did not forget one .

Please note my additional remark about linearity above.

Hermann-Josef

anon41087856 · October 22, 2019, 12:47pm

Linear gamma is a faulty name and should be discouraged.

The gamma is the particular power function linking the optical response to the electrical input of a CRT screen. There is no such behaviour in LCD or LED screens, it’s all linear.

Everyone got confused when “gamma” was used to describe any kind of integer encoding using power functions taking place in ICC profiles and such. This kind of encoding is reversible and not linked to a particular device, it is there only to avoid posterization due to quantization artifacts when saving floating point pixelpipes output to integer files. This so-called gamma only shares its math expression with the original gamma, but has a different meaning.

For these cases, the recommended names are Opto-Electrical Transfer function (OETF, from scene-linear to whatever) or Electrico-Optical Transfer Function (EOTF, from whatever to scene-linear). These OETF/EOTF are not limited to power functions anymore, but can be log, Michaelis-Menten functions, on anything that gives more room to your last EV of dynamic range before black.

XavAL · October 22, 2019, 6:28pm

I have re-read this answer like ten times, but honestly is a bit above my technical knowledge.

Let me say it with other words to see if I have understood it right. Feel free to correct me if I’m wrong (I’m kindly asking for it, btw ).

Linear gamma is not the right word, because it leads to confusion. Ok. Let’s hope the underlying idea in this thread remains right.

So we don’t need gamma encoding anymore, because our displays work linearly. I think some people would argue power function encodings are not needed in certain scenarios, but probably we would all agree that those calculations are reversible in the end. So there wouldn’t be problems.

Let’s not talk about quantization errors, nor about posterization, because what I argued about was that a tool should work the way it should work, it has to do its calculations in a way to best preserve the image quality (that doesn’t mean to preserve its original look), and then by means of an inverse gamma, OETF/EOTF, or whatever the maths needed, it has to send back the resulting values to the pipeline in a linear fashion (that is, not encoded with a power function).

Do we agree on the idea, even though the words were used incorrectly?

anon41087856 · October 23, 2019, 1:34pm

Exactly.

Assuming softwares properly decode those EOTF while opening the integer file, there is no problem. In reality, all the softs you can name (Photoshop, Gimp, Krita, etc.) keep the EOTF and apply corrections on top of that, unless you switch the “work properly” option hidden somewhere where only colour freaks will know how to find it.

Sorry, what ?

Assuming this pipeline is using floating point, because integer pipelines can’t do without the OETF (unless maybe they use 16 bits encoding). I was shocked to discover that Photoshop CC 2018 has some tools (healing brush, spot removal, not exotic stuff) that still can’t work in 32 bits float mode, and can be used only in 8/16 bits integer.

afre · October 23, 2019, 1:45pm

There are several separate issues being discussed here:

1 Encoding of integer values to mitigate quantization.
2 Linear and / or radiometric processing.
3 Display or viewing, and surround considerations.
4 Processing with round trip transformations.

XavAL · October 23, 2019, 3:35pm

Well, first I want to say again that I’m not a programmer, so most probably my explanations are not really precise or technically correct, so maybe you will have to fill the gaps.

I start from the idea that some tools do a better job with power functions, because that way the results are closer to what we expect, or introduce less artifacts to the image.

On the other hand, there are other tools that work much better with linear data (without power functions).

What I tried to explain is that if a tool gets the image data from the pipeline, perform its calculations/transformations with power functions, and returns the modified image as is to the pipeline (without properly applying the inverse power function), then the next tools won’t get linear data to start with, and it may lead to an increasing amount of artifacts, or undesired results.

If a tool gets linear data and returns linear data, everything is fine and it doesn’t matter how it transforms the image. It’s the programmer who decides which method is better.

But if data is not always linear, how is a tools supposed to know if the image is linear or not? What happens if we have 3 successive tools that work with power functions, and they don’t care if the incoming data is already non-linear?

To say it the wrong way: what happens if those 3 tools work better with a gamma encoding of 2.2, and they don’t know if the incoming data is linear or not? Do they gamma encode the image 3 times?

To me the answer to that problem is simple:

input: linear data
working: gamma encoded or linear (as the tool needs it)
output: linear data

Well, in a workflow one has to take into account everything that involves processing an image.

And about radiometrically correct images, to me a raw processing program wouldn’t have to work that way, because that would mean I’m obliged to use a second program to make adjustments which make the image different from what I shot. That is, to adjust the image to my idea of the picture. E.g.: to increase the greens and yellows saturation in a landscape picture.

I always see it from the outside in:

what do we need? Exact, precise depictions of what was shot, or a modified image that explains what we felt when shooting the image?
do we need that the quality of the image data is respected, or do we accept a certain amount of artifacts/odd results?
does each tool know what other tools do, or do they work standalone? I mean, do each tool know what kind of data each other tool outputs? If that data is linear or not?
how does each tool work internally? And any other technical, mathematical or purely scientific consideration that has to be taken into account.

If the two first points are not completely defined, we end up with something close to Photoshop…

ggbutcher · October 23, 2019, 3:58pm

Need to be careful here, as I believe at least some LCD/LED monitors emulate the CRT power function, so they don’t look different if interchanged in a non-color-managed system. I’m staring at such a LCD display right now…

We do depart-from-linear tone curves in raw workflows for a variety of reasons, not just display compensation. It’s essential to tease them apart so we understand their purpose and effect.

Right now, the first such departure in my workflow is a filmic curve that lifts the shadows in a highlight weighted exposure. This happens after delivery of the scaled, whitebalanced RGB image, and is usually the only operator before I resize and sharpen for output. And output has its own tone curve, based on the needs of the medium: display output is curved based on the calibrated monitor profile, and sRGB/JPEG is scaled based on a “standard” ICC profile for sRGB. In that way, I keep the so-called “gamma transform” targeted to the specific renditions’ needs, and not make it a confusion in my raw workflow.

anon41087856 · October 23, 2019, 4:16pm

That’s why you get an ordered pipeline with an user checking the order of operations, assuming he is educated enough to know what’s supposed to happen and when. But so far, either in Lightroom, darktable or Rawtherapee, pipelines are exposed only to devs because devs don’t want to let users shoot themselves in the foot. Which, in turn, keeps users uneducated about these issues. The sooner we break that loop, the sooner we will stop reading crap on the Internet on these topics.

I think all that falls back to a simple question:

Does the image operation you want to achieve have a meaning in the physical world ?

If it’s something we can achieve in an old-school darkroom, with glass filters and masks, then go for the scene-linear way, with physically-accurate algorithms that basically emulate real-life filters digitally.

Or if it’s something that tries to correct an effect produced by a physical apparatus, like blur, chromatic aberrations, vignette, lens distorsion, then, again, go for the scene-linear way because you need physically-accurate algorithms for that.

But if you want to do things like gamut mapping, arbitrary colour matching, or local contrast enhancements, then maybe go for a perceptually-defined space. But then, you need to be careful with 2 things:

alpha-blending (occlusion) fails in non-linear spaces, so forget about feathered masking and such, or expect issues where your masks meet,
OETF-encoded RGB is not a perceptual space. Lab, IPT and such are, but they are not perfect and mostly intended for colour-matching, not for pixel-pushing.

So, all in all, what you are left with are scene-linear RGB spaces.

You are wrong about that. Painting (on real canvas with real paint) is a radiometrically correct operation, that doesn’t prevents you from doing anything. It just sticks to the rules of physics so it’s perfectly predictible.

What you shot is not what you saw. What you see is what your brain makes you see, not your eyes. A “single” human shot is actually an HDR blending of dozens of “pictures” taken by micro-moving the retina, and blended with automatic brain edge-enhancement. Therefore, you need a transformation (a pair of goggles) that lets you view the image in a human way, while still keeping the physical signal as long as possible to play with. But you need to separate the view and the model (actual data). Let your goggles lie to your eyes while still getting your hands on the real matter.

It’s not a matter of accepting anything, it’s a matter of being able to predict the result of a setting before even touching it. If you mix blue and yellow paints, you know beforehand you will get green. With some practice, you learn exactly what amount of both you need to get the exact shade you want. Trust my experience here, with display-referred workflow, the result of one setting depends on an awful lot of variables, beginning with the source dynamic range of your picture, the age of the captain and the Jupyter-Saturn alignment.

There is a human holding these tools. Art begins with craftmanship. The hammer doesn’t know about the nail, the handy man does. #UserResponsibility

Photography was not born yesterday. Most of the digital magic we do now by pushing sliders was actually done hardware in darkrooms, so it has a physical meaning and a mathematical representation to that physical meaning. We need to find that meaning back, and go back to the basics.

Yes, if you feed an digital signal to your LCD screen, not an analog one, so you transmit 8-10 bits integers in your HDMI wires, then, of course, you need your OETF in place. But, after transmission, the DAC decodes the OETF and makes things right again, and finally the light emission of your LED is proportionnal to their input voltage + 0.7 V or something.