Survey on the linear workflow

Purpose
The linear workflow is an important part of processing that has been discussed and debated throughout the forum. I would like this thread to collect the current understanding of devs and researchers on how they see linearity should be maintained and used in the processing pipeline.

Post Guidelines
By survey, I mean the formal sense, not opinion or endless debate.

I am sure there will be disagreement. If that is the case, you may state your case in 1 post; the person to whom you are replying gets to reply once. Of course, some will warrant further exchanges, but I am putting it this way to emphasize restraint and respect among peers.

State the approach or topic (1/post), its merits and demerits, and show examples of its use. Please include links, files and references.

PS Also address edge cases: negative, high, inf, nan, imaginary values, high gamut, etc.

4 Likes

Here are some preliminary thoughts; I may edit this if I have more well-formed opinions later.

When the underlying mathematical model of some algorithm is based on a linear space (e.g. uses a Poisson noise model, such as Lucyā€“Richardson deconvolution), then I believe strongly it is appropriate to perform it in a linear space.

On the topic of performing other operations in a linear space, I do not know.

Here is a discussion of compression. I realize RT does not implement its own compression algorithm, but the JPEG compression discussed at the link is wavelet-based. So this suggests the wavelet algorithms should be applied after the gamma correction/tone curve is applied.

Resizing and sharpening may be better done after gamma correction; see that link for sample pictures.

Dan Margulis writes: ā€œWhile blurring might be better at an ultra-low gamma, sharpening and addition of noise are worse, so having filter operations take place at a low gamma generally is a bad idea.ā€ I guess by filter he means things like the guided filtering used in the RT highlights/shadows module.

edit: To mimic the effect of a color filter on a camera shooting black and white film, any ā€œchannel mixerā€ for black and white conversion should operate on the linear data.

1 Like

Firstly, I want to lay down my comprehension of the definition of ā€œlinearā€ with respect to digital imaging. A lot of what I think I understand is based on this comprehension, so clearing up any misconceptions would help me personally, as well as might beneficially shape the conversation. So, my understanding of ā€œlinearityā€ is prepended with the word ā€œradiometricā€, which identifies the characteristic I think is important to consider in maintaining the ā€œlinearā€ relationship of light measurements. At each location on the sensor array there is collected a value representing the intensity of the light focused on that location, and itā€™s each measurementsā€™ relationship to each other that I think is important to maintain as long as possible.

My proof workflow does the following:

  1. convert each 16-bit integer number from the raw file to an equivalent value in the 0.0-1.0 floating point space. This act would foment a lot of ancilliary discussion, but suffice to say that there is no net effect on linearity, each number in multiplicative terms still has the same relationship to its siblings.
  2. assign a camera space to the image - no change of the values, but will be useful for display/output conversion.
  3. blackpoint subtraction. No change of the multiplicative relationships, all data moves down by the same camera-specific value. Effectively establishes the place where the color black is anchored in the data.
  4. whitebalance. This one i struggle with a bit, as while the essential operation is a mulitplication, different multipliers are applied to each channel, radically changing the channelsā€™ radiometric relationship. Iā€™m doing it this early because dcraw does it about here, and I havenā€™t done any comparative analysis to assess its placement. Well, I did look at baking the white balance correction into the profile transform some time ago, but I never had the time to figure out how to work per-session camera profiles into my workflow.
  5. demosaic. For proofing, I use the half algorithm, which just builds each pixel from each quad mosaic, no change to the measurements. I just end up with a half-size RGB image with the same measured intensities. I donā€™t understand the ā€œrealā€ demosaic algorithms enough to comment on their radiometric intrusion.

Okay, this gets me to a RGB image whose white point is still where the camera comprehended it as the saturation limit, for my cameras thatā€™s 16383 (14-bit raw) in 16-bit integer, or about 0.249948 in 0.0-1.0 floating point. So, the next tool I apply is a black/whitepoint scaling, in Gā€™MIC thatā€™d be the -normalize operator (or -cut, I canā€™ recall). I think some folk use exposure compensation to do the same thing, but the transform is equivalent: a multiplication of each measurement to put the very highest measurements at 1.0 floating point. 1.0 is eventually mapped to the display notion of white. For my images with small to medium dynamic range, the proof image at this point is usually sufficient for regarding, no basecurve/filmic/loggamma pet tricks required. Someone tell my why this isā€¦ ??

For proofing, at this point I just resize to a decent size (~800x600), minimum USM sharpen, and save to JPEG. WB aside, bear-of-little-brain here thinks the only departure from linear in the above workflow was the conversion to sRGB gamut and tone for the output, to comply with the display norms @troy_s attempted to explain some time ago.

Now, dynamic range of the scene is what Iā€™ve come to realize vexes this simplistic workflow. Indeed, to preserve highlights at capture, ETTR would compel one to underexpose a high-DR scene, which will put most of the image tones way down toward black. This, to my experience, is where departure from linear is required, in order to move those lower tones into the mid-range while keeping from pushing the highlights into oblivion.

Hereā€™s a recent example:


Note the workflow described above is captured in a group tool; this is new for the upcoming 0.9 release. The blackwhitepoint is set to scale the data from its min and max values to 0.0-1.0. The tone tool is next; I pulled the parameters pane out of the dock to show the complete tone tool settings- the filmic tone curve is selected, and itā€™s plot is at the bottom of the parameters pane. This curve is a very non-linear transform, and until this tool the data to my maybe warped thinking, was ā€œradiometrically linearā€. The display is generated from the sharpen tool, last in the chain, and is approximately sRGB, color and tone.

If I were to select the blackwhitepoint tool for display, the image would be quite dark overall. I used the Z6 highlight-weighted matrix metering mode to preserve the highlights; noteably, the in-scene light sources are blown, but the sky above the building is not. The filmic curve pulls the dark majority to comfortable visibility.

Iā€™ve recently had to rework my blackpoint handling, e.g., the subtract tool is new to accomodate my new camera. Also, some operations push data negative, Iā€™m still working on how to informatively consider this, but for now the data min/max of the blackwhitepoint tool just clips at 0,0.

With respect to high values, I tend to let my processing push data past 1.0, then I look at that outcome in display, and sometimes Iā€™ll modify a curve to pull things back. Using floating point internal representation makes this trivial to handle.

With respect to gamut, until recently my practice was to, immediately after demosaic, convert the image from camera space to a Rec2020 working space. Iā€™m not doing that right now, because Iā€™ve seen artifacts I donā€™t like, and they donā€™t occur if I save the colorspace conversion for final display/output. Not understanding that, to date.

@afre, a good thread, Iā€™ve been musing over such recently with the new camera, and trying to make pragmatic sense of all that ā€œunboundedā€ discourse from a while back.

Edit: So, @afre pointed out something Iā€™d not captured in my original post, that I needed to elucidate what Iā€™ve learned. Okay, what Iā€™ve learned is that departure from scene-linear is really only needed for two reasons:

  1. Accommodation of non-linear displays. @troy_ sā€™s lesson.
  2. Compression of a sceneā€™s dynamic range that challenges our camera and/or display capabilities.
5 Likes

The purpose of this thread (and forum) is to share with one another what we have learnt. Thank you for participating and pointing out your gaps of knowledge. Hopefully, after a dozen more posts, we will have a better appreciation of the challenges involved.

Thanks for that, @ggbutcher. You might include denoising before the demosaic.

I donā€™t quite follow. If you subtract a constant from two different values, the ratio of the new values will be different to the ratio of the old values. So there is a change of multiplicative relationship, isnā€™t there? Hence the values are no longer ā€œradiometric linearā€.

2 Likes

Ah, thanks for straightening out a math-adverse goofballā€¦

Yes, Iā€™m a bit out of my field with the mathematics. Indeed, one of my on-going objectives is to precisely understand ā€œradiometrically linearā€ in these terms. My general feeling about all this is that the original measurements of light energy are important to the colorimetric and tonal interpretation of the scene in an image, and each manipulation should be carefully considered, and limited to the minimum required to achieve the desired rendition.

For instance, Iā€™ve been messing with the original filmic function, trying to better-understand its effect. In my current tool chain, it is the first significant departure from linear, things like blackpoint subtraction and whitebalance notwithstanding. There are discussions out there that implore a subsequent application of an S-shaped curve after filmic ā€œliftsā€, and that to me seems to be undue whipsawing of the tonality. So, Iā€™ve been playing with the A, B, C, and D coefficients, and I find that I can bake most of what Iā€™d do in an S-curve into them. I donā€™t fully understand their interaction, but for instance, the B coefficient is key to shaping the curve toe and the tonality of the lowest shadows. In the recent ā€˜infinite flowersā€™ playraw i posted a filmic-Scurve rendition, but I could probably have baked it all into the filmic toolā€¦

@afre, I know you want to capture our thinking in single posts, but itā€™s hard not to discuss itā€¦ :slight_smile:

@ggbutcher Which means, permission granted. :stuck_out_tongue: Hint: It is okay to list things in your first post but try to be focused in subsequent posts, separating topics into manageable chunks. :wink:

Although I have things to say about the replies so far, I will wait until more people get a chance to contribute. That said, question: is normalizing data to [0.0-1.0] radio-metrically acceptable?

A very good question, one that Iā€™ve pondered incessantly of recent. Iā€™d surmise, yes, as the numbers used to represent energy measurements are arbitrary, but the ā€œdistanceā€ between them expresses their energy relationship. Proportionately, I think thatā€™s maintained in a normalization, as I believe it is with an exposure multiplication. That, from a person with three university degrees, containing four math courses, totalā€¦ :smile: - FWIW!

@snibgoā€™s comment stimulated this question. I should be more precise: I mean Feature scaling - Wikipedia.

I suspect this has been brought up before; just donā€™t remember where.

ā€“ @Elle has spoken about arithmetic with respect to blending, and gamut; uncertain if she addressed this. Unfortunately, she isnā€™t here anymoreā€¦

ā€“ I recall @David_Tschumperle telling me to divide by a factor to get to the [0.0,1.0] range and then multiply to restore the original range. Out of habit, I still normalize more than I divide and multiply.

In rawproc I handle normalization a bit oddly, it seems. When I read raw ADU values, unsigned 16-bit integers from libraw, I scale them to equivalent numeric values in 0.0 - 1.0 floating point. That is, say, 16383, which is the maximum 14-bit value, gets converted to 0.249984741 in 0.0-1.0 floating point. Thatā€™s where raw images start in rawproc; I consider that representation to be ā€œlinearā€ with respect to the original energy.

At some point in the chain, usually right after demosaic, Iā€™ll stick a blackwhitepoint tool. Thatā€™s where the data is scaled such that the original saturation point is brought to 1.0. ā€œblackā€ usually doesnā€™t mean anything here, as the data has already been anchored to 0.0 with blackpoint subtraction, if needed. Out of habit, I use a slope-based ā€œlinear curveā€; I used to use the curve tool but thatā€™s computationally expensive. If just moving the saturation point, it could just be a multiplication, I thinkā€¦

After that operation is where Iā€™ll put non-linear tone curves, which I have coded to optionally normalize to the 0.0-1.0 range. That is, the curve tops off at 1.0, so data thatā€™s already < 1.0 will not be pushed past 1.0. Iā€™ve assembled a little ā€œzooā€ of tone operators: gamma, reinhard, loggamma, and filmic, and Iā€™m currently messing with filmic to see if it has utility working with the highlight-weighted metering of my new camera. So far, emphatically yesā€¦

Thing is, my proof workflow uses an option of the blackwhitepoint tool that pushes the black and white points ā€œinto the histogramā€, that is, arbitrarily clips the extreme values to a preset proportion. Often, that is sufficient to render a useable image, particularly if the scene dynamic range was small. So, a lot of my images go out the door with only the sRGB TRC to ā€œmake niceā€ for the wild web. Linear, from start to almost finishā€¦

That is the essence of my linear workflowā€¦

It seems to me there are two related but different concepts:

(1) Linearity, which I take to mean pixel values are proportional to the energy received by the camera, or hypothetical camera.

(2) Scene-referred values, which are linear but also have no maximum because whatever illumination the original scene has, it can always be increased. By contrast, output-referred values do have a maximum because screens and paper have a maximum brightness.

Scene-referred also implies that a collection of such images (including videos) could be edited together in some well-defined manner to work together seamlessly, regardless of different cameras, software, and so on. Thatā€™s a large topic.

I deal in photos and videos, where achieving (1) is fairly easy. Modern cameras seem to be fairly linear. If they arenā€™t, a correction isnā€™t difficult. And this may be where @ggbutcherā€™s black-level subtraction comes in: for such cameras, a value is subtracted in order to make values linear. Or, at least, values are then assumed to be linear.

Multiplying or dividing linear values by a constant will retain their linearity. But capping values at some limit will destroy linearity. Many operations to improve aesthetics, ā€œthe lookā€, will also destroy linearity.

But (2) is more difficult. There is no maximum value, there is no whitest white. (Or there is, but it is at infinity.) Many operations we are familiar with donā€™t exist or need to be revised. For example, a scene-referred image canā€™t negated, interchanging black with white.

A low-level non-interactive tool such as ImageMagick handles linear data with ease. Almost nothing in IM cares whether the data is linear. The arithmetic doesnā€™t change. Scene-referred is messier because IM originated in the days of integer-only processing, and we need to switch off mechanisms that clamp values beyond ā€œblackā€ or ā€œwhiteā€.

A GUI is more difficult because we need to see results on the screen, which is non-linear. So a GUI might work on a linear version of the image, and convert on the fly to whatever the screen needs. I donā€™t know if Gimp works this way. I hope it does.

A GUI for scene-referred images has the extra complication of reducing the infinite range of scene luminescences to the finite range of a screen. It compresses (by an infinite amount!) or just shows a small range of the image. Perhaps Natron etc does this. Gimp doesnā€™t.

In the OP, @afre mentioned ā€œhigh gamutā€, but I think that is a different topic to linearity or scene-referred. I donā€™t do much cross-gamut work. IM doesnā€™t expose all the functionality of LCMS, and I suspect LCMS doesnā€™t have all the tools I would like, for example fine control over squishing wide gamuts into smaller ones.

1 Like

@afre, apologies, but you got me going with this threadā€¦

Thing is, at the scene, the camera has a hard maximum in its saturation point. Everything in front of it has to be mapped somewhere below that. So, the recording you bring home is pegged there, for any post processing. Now, CGI and other synthetic imaging do not have that restriction, but Iā€™m not making movies hereā€¦

Yeah, Iā€™m not sure where it comes from, but with the Z6 if you donā€™t do it, blacks are just not right. Just pull the value from the metadata, apply it, and go on. My D7000 didnā€™t have one, so it seems to be sensor-dependent. Anyone more-informed on that dynamic should feel free to school meā€¦

BTW, as far as black level - Iā€™m a bit rusty, but the camera ADCs are set up with an offset such that if there is 0 input signal (e.g. no photons), you get electrical noise centered around the black level.

I BELIEVE the rationale here is such that internal noise is evenly distributed around the black point, as opposed to getting ā€œmirroredā€ up into the positive signal region. Iā€™m a bit rusty here though. I sort of recall an analysis (Iā€™m fairly certain it was by JimK whose blog nik cited, may have been Horschack who does a lot of technical analysis on DPReview including deducing the underlying mechanism of Sonyā€™s ā€œstar eaterā€ and some apparent RAW cooking -Sony A7xxx Posterization and Colored Banding (Part 3): Sony Alpha Full Frame E-mount Talk Forum: Digital Photography Review for example ) that showed that cameras which didnā€™t have a black level offset tended to have more read noise in those lower few bits due to negative noise values getting mirroredā€¦ Again, this is based on partial memory and I could be misremembering it. It was either on the same blog nik cited or in the DPR forums, both of them are not particularly search-friendly. :frowning:

The idea is that the sensor data is a linear function of the input PLUS a fixed constant offset, which is the black level. So when you subtract the black level, the result is now a linear function of the number of input photons. At least from most discussions Iā€™ve seen, modern digital sensors are very linear with very little deviation right up to the point where they clip.

Having the internal noise sources accurately represented as opposed to being mirrored around 0 also helps if you want to average multiple exposures.

As to linear vs. nonlinear workflows - Obviously some algorithms work better on linear data and some work better on nonlinear data. Interestingly, the analysis JimK did involving the effect of gamma-compressed vs. linear data for resizing and sharpening algorithms in the post cited by nik is in direct conflict with claims Iā€™ve seen made here recently that such operations should NEVER be performed on anything other than linear data. Personally - I almost universally trust Jimā€™s judgement over that of most others for a variety of reasons.

Also, one must keep in mind that in most cases, an evaluation of the final result of a processing workflow is a subjective one - does it look good to the viewer? What may look good or unique may fundamentally NOT be what was present in the original scene.

In my case - some of what Iā€™ve been working on lately is exactly this, specifically to handle display limitations. Right now, almost any camera on the market has dynamic range recording capabilities that are WAY beyond a typical display. (Current HDR displays are a whole other story - Iā€™ve found that the exposure fusion approach which is enormously beneficial for someone viewing on an SDR display is completely unnecessary if you take the same image and feed it to an HDR display without any dynamic range compression. Sadly, weā€™re years away from having such displays be more common than SDR displays AND having proper content delivery infrastructure for it - in my case, the only way I can get content to my HDR display is to take an image and encode it to a 10-bit HEVC video, which is enormously wasteful.)

Itā€™s no secret that I have a strong preference for the Mertens exposure fusion algorithm. Interestingly, fusing synthesized exposures generated from a high dynamic range RAW image is not exactly how it was initially designed, but even if itā€™s an indirect approach to local tonemapping, it still generally seems to outperform most other local tonemapping approaches, rarely ever exhibiting the flaws (such as haloing) that can arise from local tonemapping. Nearly everything about the algorithm is nonlinear, but in the end - the perceptual results are such that products which use the algorithm as a fundamental building block of their default mode of operation (Such as the Pixel 3) routinely get praised in reviews of their camera functionality.

2 Likes

While your last paragraph is absolutely and irrefutably correct, there is room for a right and wrong here. Some operations can and are being performed on gamma corrected data, but that does not take away the fact that theoretically they actually definitely should not be. There are multiple sources that clearly explain and show that things like blurring and resizing should be done on linear data. Please see here and here for example.

Edit: just to add, as @anon41087856 would probably reiterate too, there is physics and mathematics (en electronics) underlying digital photography. There is little sense to abandon or ignore that, at least, for me.

Edit 2: I see I got some double negatives mixed up. Tried to rephrase.

At least from a quick read of your two links, section D1 of the second one:

ā€œWhether you get ā€œprettierā€ results when using a gamma=1.0 or a gamma=2.2 RGB color space is an entirely subjective call, and in my opinion, the artist is always right.ā€

That doesnā€™t look like a ā€œclear explanationā€ that blurring and resizing should always be done on linear data to me.

Edit: Should the ā€œtechnically incorrectā€ approach be the default approach used? No. But should it be completely forbidden for anyone to ever use that approach? Also, I firmly believe, no.

1 Like

For clarity then please also quote the first part of section D1 from @Elleā€™s article:

ā€œIn all comparisons above, the colors in the images on the left, edited in the linear gamma version of the sRGB color space, are technically correct. The colors in the images on the right, edited in the regular sRGB color space are technically wrong (ā€¦).ā€

Thereā€™s always a distinction between technically correct and artistically pleasing. :slight_smile:

On the note of processing in linear vs gamma compressed, I really do not have much to add to this discussion, as I am not a developer, except for a specific preference for linear for generating sharpening masks, such as Rawtherapeeā€™s new-ish contrast threshold feature.

Regardless if the actual sharpening is done in linear vs gamma compressed (donā€™t have an educated opinion on that debate), I think that any sharpening mask should be created in the linear space, since the signal to noise ratio tends to be poorer in the shadows, and lots more fine detail is above the noise floor in the highlights, when observed as a gamma encoded image. This observation is supported by the discussion of photon shot noise linked here. http://www.photonstophotos.net/Emil%20Martinec/noise.html

Given this correlation, then the S/N ratio should be less variant across the tonal range in linear color space, so setting a minimum threshold for sharpening would be able to be set just above the noise level (optimum setting) for a greater tonal range of the image, compared to where I have to do now, where I set the threshold tuned for midtones, and faint detail in the highlights doesnā€™t get sharpened, and higher nose in the shadows gets sharpened.

Edit INB4 someone responds ā€œJust export two versions with different threshold values and blend with luminance maskā€ . That would be a waste of time, and raw processing would be more intuitive and quick and I could just tune S/N ratio for one patch with noise and detail in the midtones and assume I wonā€™t make the highlights look waxy, or the shadows noise be amplified.

Edit INB4 someone responds ā€œJust export two versions with different threshold values and blend with luminance maskā€ . That would be a waste of time, and raw processing would be more intuitive and quick and I could just tune S/N ratio for one patch with noise and detail in the midtones and assume I wonā€™t make the highlights look waxy, or the shadows noise be amplified.

Isnā€™t this really the right answer though? (That, is the raw converter should be able to tune the sharpening threshold to scale with luminosity. I agree doing it by hand is annoying.) The shadows will always be noisier than the highlights just by the virtue of how the physics works out. In a linear space, if a pixel collects N photons, itā€™s going to have SNR proportional to N^{1/2}. Highlights have many stops (say 2^6 photons, at least) more input than the shadows. So they are guaranteed dramatically less SNR.

Further, Iā€™m not sure what doing it in a linear space gets you compared to a gamma-corrected one. The gamma correction is a monotonic map of N and hence of the noise levels, so for any threshold in linear space you should be able to find another one in gamma-corrected space that gives similar results.

This is all heuristic, so maybe Iā€™m missing something? Iā€™d be happy to write out something mathematically more rigorous if you want.

Actually, now that I think of it, you do have a point. With a sufficiently wide ranged gamma correction for the threshold weighting, whether the mask is being created in linear or gamma corrected space becomes irrelevant. My initial gut sense was that linear space would have a more even amount of noise across the tonal spectrum, but after doing some mental thought experiments of noise SNR of ā€œSNR proportional to N^{1/2}ā€, echoed on the Martinec article, realized that the range of values in a linear perspective from a small number of photons would be less than that of a large number of photons, even though it is the other way around from a proportional perspective. Therefore, creating an edge mask in a linear space would target the shadows too much. This is confirmed by my experience of gamma settings of 1.0 for RT noise reduction targeting the shadows too much. Then there is variables such as read noise, and it becomes pretty apparent that there is no one size fits all choice, and that a user defined weighting curve is always going to be likely necessary.

I think a good general point is surfacing here, in that preserving the energy relationship has more value in some operations than others. Endeavors that are concerned with color and tonality definitely have a concern with the energy relationship, endeavors that are concerned with edges not so much. Indeed, certain tone mapping may facilitate edge detection in things like convolution kernels, where its the difference in adjacent values that facilitates the transform.