Module position with respect to display vs scene ref'd

Thanks for the explanation of the new model. I was just playing a bit and can see that it can help with images where I have used Filmic v5 with no preservation so far to avoid the salmon color of sunsets. However, since it is recommended - for good reasons - that color adjustments should be done later in the pipeline we go towards a mix of scence and display-referred workflow, aren’t we? Shouldn’t there be another workflow setting to set a different default module order? Of course this could be done with presets. Also, is there any advantages of using Color balance RGB over Color zones? Just some questions that came to my mind while playing…

colorzones is in display referred part - so there’s less room for changes without side effects …

Yes, but if one moves CB after the tone mapping step, we are no longer in a linear part of the workflow. So is that argument still true? Anyway, in the end it’s the result that counts …

CZ works in LAB… CB works in either UCS or JZAZBZ… There can be differences even when you use the two modes in CB… not many people even check them…

image

CB might also have some gamut control in the background… I would have to try to check the code…

I think it’s precisely why some people advocate against using terminology such as “Scene referred” and “Display referred”.

Making photos should be “Photo referred”. There are some processes that need to happen before the picture is produced/made/etc, others - after.

what is photo referred? paper print with cheap color laser printer or fine art print or …? Introducing new terms doesn’t help - you need to get the meaning of scene referred and display referred if you want to get best results from darktable.
if you don’t care, you can of course mess around with both if the results fit’s your demands… but darktable doesn’t prevent you from shooting into your foot …

2 Likes

No day without learning something new :blush:

I think introducing new terms is exactly what is needed to rectify a problem with old terms. But I think this is not the topic to do so.

All I’m saying that not everything can be done in “scene-referred” (think frame, watermark, tint, scratch); some argue that contrast, sharpening and denoising should be done in “Display referred”, or after the photo is built.

Same with saturation. There are cases where you want to boost the saturation before the ‘tonemapping’, there are cases where you want to boost saturation after. Sometimes even both.

Especially if your goal is to keep the final result intact.

Concentrating on the picture is the best thing you can do to a photo.

3 Likes

Is not the light coming out of the display linear?

2 Likes

In case your not joking :slight_smile: Understanding Gamma Correction

3 Likes

The light that is emitted by the LEDs in the monitor, is linear.
Same as the light from a light-bulb is linear.

5x lightbulbs emit 5x light. You just don’t perceive it as such.

But I think this is quite a leap from a pipeline discussion.

1 Like

Yes for sure…its about the gymnastics involved in all the steps between scene/capture/conversion/processing/output to account for this……

I’m not 100% sure you meant this, but yes - the light emitted by the display is linear (for the most part *) with respect to the picture that is output by sigmoid or filmic in darktable and is still in a linear RGB encoding.

So the picture that exists after sigmoid/filmic exists in its own right and is encoded in linear RGB. It is not linear with respect to the raw data / photons captured by the camera, but that does not matter as long as one wants to do manipulations on the picture. At least this is very relevant to the pipeline discussion.

More interesting stuff coming later…

We would do well to talk about the Electro-Optical Transfer Function and its inverse. When the picture coming out of darktable is saved to a sRGB encoded JPG file, the inverse EOTF is applied to the linear RGB to produce the code values to be saved. When the picture is displayed on a physical device, it is encoded with the display’s own inverse EOTF before sending the code values to the display.

There is no “gamma” as in power function necessarily involved. Some interesting resources:

https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.2100-1-201706-S!!PDF-E.pdf (discussion of OETF and EOTF starting on p. 11)

(*) There may be some discrepancies in the low end of the encoded values

2 Likes

Thanks …it was a quick grab to try and share why people would be talking about this at a very basic level wrt image processing. Thanks for the link… I’ll take a read. I find the aces video terminology makes good sense with display transforms color space transforms etc etc. it can seem easier to follow what is happening to the data

I think this is where I had another idea of a linear (scene-referred) workflow in mind. For me, linear meant only applying linear operators to the RGB array such that the intensity is still linear with respect to the light of the scene. But in a color science terminology it can mean something different and that is not my field of expertise. (thanks for the link, I will take a look later)

The key question to me is: How can one support users to choose the correct order of modules to get most out of DT.

They’re already in mostly the correct order.

Paperdigits is mostly correct saying that the order is acceptable as it is.

If you wish to have advanced control,I think it all boils down to really understanding the pipeline and having a point in it, where your picture “is born”. Where you would be happy printing it and sharing it. But you still want to make adjustments to it (and not modify the data behind it).

An easy-to-understand example is tinting. Like covering a printed photo in a thin layer of paint, you want to tint the picture that is already ‘built’, and not apply it to the data before.

Or think about a white frame. You wouldn’t want to apply a white border on the data early in the pipeline, because the frame will get affected by everything in the pipeline. You want to add a frame at the very end of the pipeline, and add it almost as it was “on top” of the picture.

Hence my original comment on the “Display referred bad”/“Scene referred good”.

In “Picture” terms, there’s stuff you do before it’s produced/built/made, and after it’s produced/built/made. So the reference point is the Picture

(“The Picture” is something that comes out after applying a “tone mapper” (sigmoid or filmic rgb), everything before that point is modified data captured by the camera, I find it’s helpful to separate those two states. You wouldn’t print something that doesn’t have sigmoid or filmic rgb applied, would you?)

1 Like

What you nicely explain I think sometimes is that sort of separation between necessary technical post processing and fixes and then there is the “look” or style… its hard/harder to preserve your birthed image if you go for a tweak or look using modules coming before that point in the workflow…

I wanted to wait for @flannelhead to finish their post, but I was aching to write something. I have not edited or reviewed this piece, but hope it turns out to be what I wanted to say.


As with most terminology, people have different interpretations and applications. From my perspective, linear and scene-referred post-processing have different objectives, but share in their goal of creating a clean outcome without weird and unexpected results.

Linear processing is about preserving spatial, noise, colour ratio, etc., profiles of a captured image during post-processing to prevent artifacts and distortions. It is the technical pursuit behind the hood to ensure that the user gets the most kilometrage (yes, I am Canadian) out of the processing. A non-linear, and might I add, busy, approach will almost always be less effective. I will not elaborate too much on what that looks like because I just want to give a simple explanation.

It is like making Japanese food, particularly sashimi, and perhaps other cuisines that require attention to detail. When one wants to serve “sushi-grade” raw fish, one has to make sure the fish is sourced from the market at 5 a.m. or earlier in the morning. I believe they bid for them in the markets, where only the freshest and cleanest would do. Then the storage before preparation must be as short as possible. For food safety or if service is not instantaneous, one must flash freeze it to ensure that parasites are dead, yet fish still has its freshness and texture. Then the knife-work. That is where the linear analogy shines: it has to be minimalist and in such a way where it does not affect the original textures and nature of the fish. One false move and the fish would not taste, feel or smell the same as a bear biting into it. When it comes to service, the fish is still quite linear because, well, it is to be served as sashimi. The chef may do some finishing touches, torch, salt or sesame the fish slightly, as to encode it if you like to run with the metaphor.

Scene-referred on the other hand is different. Basically, reusing the analogy again, it is preparing the sashimi and then reconstructing it to become the fish again so that it is like-new. That may be a funny exaggeration, but let me explain. The goal of scene-referred is to refer to the scene to ensure the resulting photo is 1 to 1 to the original scene, a facsimile or replica.

So while linear processing is the technicalities, scene-referred is an approach to image processing where the end goal is to replicate the ground truth of the scene. Well, that is not 100% possible because scenes are ephemeral. Time, space and probability changes with every passing moment. No two eyes or cameras are the same, etc. And most likely, one does not want to make it the exact same thing, though people like me hope it would.

Another analogy for scene-referred would be the outdoor landscape painter where the painter is actually at the scene and can compare their output with the scene in real time. The scene is most definitely changing in real time and not static, but the painter can compare how light falls, illuminates, reflects and transmits through the scene with the actual painting. If they like, they can create something very real to life. The process itself the actually application of paint on canvas may not necessarily be linear, but the result can be a close match if the artist wanted.

One last note about linear processing is that in the most ideal case, a most linear workflow would be reversible. Now, this is really hard to achieve, if not impossible. Work has been done to create reversible algorithms, but I think it is outside of our discussion’s scope. And something related that has been bothering me slightly is restoring non-linearity after each step of processing. Anyway, it is in future work territory. Having goals is good though. AI may be a huge step in the scene-referred space. In some ways it will not, since AI generally creates new information out the training on datasets and inference.

I think it would be a bit heavy to specify alternative workflow settings at this point.

Rather, it would be more useful to learn to think of the purpose of each adjustment. I am going through such a learning process myself, so let me share what I have found out.

As @AD4K has already mentioned, the “display referred” / “scene referred” distinction might not always be a useful one to make. It kind of implies that the photons caught by the sensor represent a “scene” that we want to present on the display as a simulacrum.

There is evidence that if we would do this (reproduce the camera colorimetry exactly on a display device), the result would look quite “off” to most people. I highly recommend reading this article about pictures by Troy Sobotka which introduces these ideas, along with plenty of evidence that pictures exist in their own right.

So let’s consider that the output of sigmoid or filmic rgb is considered as a picture. The input data from the camera has undergone a transformation where it departs from the original data and turns into a picture. This involves some intentional distorting of the incoming data. The output luminance is no longer proportional to the luminance of the tristimulus data, and the highest values are decreased in purity such that they roll off to white. This may involve also some skewing of the “hue”.

The result is a picture which can be displayed or printed and it looks readily reasonably good. So let’s say you would like to do some grading of the colors via a tool like color balance rgb or color zones. Since sigmoid and filmic can distort the data quite a lot during the image formation, wouldn’t it be more effective to act on the readily-formed picture instead? One wouldn’t then have to predict how their color grading changes would look like when rolled through said modules.

Another sign that in my opinion actually hints that color balance rgb is better used on the “picture” state of the pipeline: there is the “white fulcrum” control in the masks tab. The intent according to the manual is that it should be set to the value of “white” in the unbounded data (before sigmoid or filmic). But then comes the question: what is white? In the data before the picture is formed, there is no notion of white.

We can try to figure out what value of the data would be mapped to white in the picture, but we run into a problem. This screenshot is from filmic’s visualization.
filmic_mapping
So what’s the value that corresponds to white on display? Is it +5, +6 or +7 EV? Those are all but clumped together in the high end of the display range. Yet, changing the white fulcrum in color balance rgb between these values makes a significant difference in the end result. But should it, since a white pixel in the picture could have originated from any of those source values? I don’t think so.

Since the notion of white exists only starting from the moment when we form the picture, any adjustment that refers some “white” should be done on the picture, not on the data prior to the picture formation.

What processing should be done on the pre-picture-formation data, then? Probably many. I haven’t though about this quite far myself, but at least some examples come into my mind:

  • Emulating a graduated ND filter by a masked exposure instance. Such a filter is something that you would stick between the “scene” and sensor, so it makes sense to do such a manipulation close to the original data
  • Emulating a fill flash by a masked exposure instance. Again, makes sense to work close to the data captured by the sensor, prior to image formation.
  • Probably many other things

I hope these ideas make sense to some of you. Remember that those are mostly just ramblings of my own as I try to grasp the idea of pictures and how they are formed. This post grew already longer than I’d have expected, and I need to sleep now :slight_smile:

8 Likes