Scene-referred editing with PhotoFlow

Elle · March 25, 2018, 1:26pm

No, you are right. I haven’t had my coffee yet

Expsoure compensation doesn’t change the ratios which in fact means doesn’t change the dynamic range - that’s the whole point of my question to @anon11264400 .

What exposure compensation does do is lower (or raise) the absolute luminance of the pixels proportionately.

My question to @anon11264400 still stands:

How is @Carmelo_DrRaw 's “negative exposure compensated” file not scene-referred?

Carmelo_DrRaw · March 25, 2018, 1:35pm

Thst’s a truly valid question, and I totally second your on trying to clarify this…

Elle · March 25, 2018, 2:20pm

I’m “absolutely” sure there are terminology problems with my phrase “absolute luminance of the pixels”.

This page shows how to calculate the ev of the scene, and also provides a convenient chart for typical scenes:

http://www.fredparker.com/ultexp1.htm

This page shows how to convert from ev of the scene, to candelas per meter squared (“nits”):

FWIW, I provided these links previously at the bottom of this article:

What is not clear to me is exactly what the 18% middle gray-based “scaling” that seems to be the first critical part of setting up an OCIO scene-referred workflow is actually supposed to accomplish:

Sometimes the point seems to be filming the scene without blowing out the important highlights, which requires knowing the DR of the camera compared to the DR of the scene, along with consideration of “where to put black”, and etc. And then scaling all the frames (some of which might have been shot at higher or lower camera exposure value (combination of f-stop and shutter speed), to make it “as if” the frames had all been shot at the same camera exposure value.
Sometimes the point seems to be to scale the pixel values to match the “nits” of the actual scene.

afre · March 25, 2018, 2:28pm

@Carmelo_DrRaw @Elle I might be wrong but I don’t think he meant to contradict himself on dynamic range. Reading it another way, he might be referring to the files themselves. I wonder whether a part of confusion has to do with the fact that the scene-referred data remains the same in an OCIO managed file. It is the metadata that changes. Just a guess, I don’t know if OCIO handles files like that.

Carmelo_DrRaw · March 25, 2018, 2:43pm

I see two reasons for setting an absolute scale with a 18% gray patch:

when multiple footages have to be combined in a coherent way
when taking an archival picture of some piece of art (ex. a painting) or a manufactured object, in which case one needs to compare the actual brightness(?) of the colors with a reference patch

Artistically speaking, I do not see big advantages…

Carmelo_DrRaw · March 25, 2018, 2:46pm

The files are 32-bit floating point TIFFs, and one is tagged with a linear Rec.2020 profile. I have no idea what could be wrong.

afre · March 25, 2018, 2:50pm

Again, it is just a guess . We sometimes mean something that isn’t immediately obvious or make typos like “multiple foot ages”, to which I figuratively scratched my head for a few seconds .

Carmelo_DrRaw · March 25, 2018, 2:52pm

The beauty of smart phones with “smart” correctors…

anon11264400 · March 25, 2018, 3:07pm

I don’t have the source file to fully test.

What would be easier is to remove the misdirection of blending and process a simple dcraw -T -4 as a comparison, as the file seemed rather mangled up with a transfer function of some sort, or inappropriately merged. The ratios simply don’t seem to be there.

The problem is that they appear to rise up as a solid unit, which again suggests incorrect merging. I may be entirely wrong here, but the ratios between the sunlit grass and the deepest shadows aren’t what one would expect. Even tested against a six stop exposure shift.

Did you apply correct exposure compensation to the bracketed shots before merging?

A linear file, without any processing, would be far easier to compare against.

View transforms are fixed; they require specific code values to be mapped to specific code values.

This is 100% correct, within the epsilon, maximum, and encoding issues specific to float encoding.

gez · March 25, 2018, 3:09pm

Let’s say your sensor has 12 stops of dynamic range and all that information is captured linearly as integers in the range of 0 to 1. You need to know how much to scale it to effectively produce a scene referred floating point image.
Shooting a reference of 18% of reflectance is useful to align your exposure.
Why 18%? Because It’s the value perceived by us humans as middle grey in diffuse materials (being an ideal 100% reflectance material perceived as white).
Having your exposure pegged to the perceptual middle gray ensures that every diffuse material you shoot will be perfectly exposed and you still have headroom for highlights.
So getting your on-exposure value in middle gray is a good way to use your camera’s dynamic range effectively.
Based on that, scaling your capture to meet the on-exposure value ensures that your fp file is aligned with the scene.
And as @anon11264400 pointed out, having that reference is important for view transforms.
You shot a well exposed gray card, you expect that it ends up as middle gray in your screen/display-referred jpeg.

age · March 25, 2018, 4:12pm

So in theory with only two shots, 0EV and -2EV,
Where middle gray in the 0EV photo is 0.18
And middle gray in the -2EV photo is probably 0.045 (0.18 /4)

Then the correct approach is :

apply a +2 exposure compensation to the
second photo.
now the 0EV and -2EV have both middle gray at
0.18
merge with luminosity mask

Is it right?

anon11264400 · March 25, 2018, 4:24pm

EDIT: Incorrect.

Correct math, but incorrect approach as per your outline. Easier math is 2^STOPS * VALUE, which as per your example would be 2^-2 * 0.18.

EDIT: It appears I misread you. This is not correct as you typed, but I think you inferred you understood the idea. You would want to keep the -2 photo at -2 values, which means multiplying it by 2^-2 * VALUES, and then merging that data into the proper range of the other photo. Your point 1 suggests a +2, which may be a typo. The +/-0 EV photo would rest +2 relative to the second. You do not want to make your middle grey’s identical, but rather use the middle grey to anchor the two files at their respective scene referred values.

To be clear:

Within each respective shot, we have an “at exposure” of middle grey.
In the +/- EV0 shot, we would anchor this middle grey value at 0.18.
In the second -2 EV bracketed shot, we would want to scale the middle grey down two stops, so we would exposure adjust the photo down by 2^-2 * VALUES. Now the middle grey exposure lives at 0.045 and we would merge[1] accordingly.
If we were to add another photo that was +3 EV from the first, we would take that photo and scale it by multiplying the values by 2^3 * VALUES or 8.0, then merging.

We would not want to align all the middle greys, which is what I believe is essentially what may have occurred in the first demonstration by @Carmelo_DrRaw , although other funky mangling may be at play, possibly including nonlinear data. Taking linearized data “as is” and merging is not appropriate for a scene referred image entry point as the image no longer represents the relative ratios of the scene. Such an approach makes the data nonlinear, and subject to all of the incorrect issues that everyone is familiar with in a nonlinear reference space.

[1] Assuming of course that the luminance key is correct and operating on scene referred data etc., not nonlinear data.

ggbutcher · March 25, 2018, 5:19pm

So, I completely get the need to align content from multiple sources to the same 0.18 gray reference for video, you want the footage from multiple cameras in a take to be tonally indistinguishable (yes, I took a cinematography college course, in 1976 ). But in HDR stacking, it’s that very difference that’s needed to provide the data to express the entire dynamic range of the scene that the single camera couldn’t capture.

I think also, in the proposed workflow, that the 0.18 reference is also critical to the proper performance of the downstream transforms. In OCIO, for color, there appears not to be anchoring to CIE XYZ, and for tone, LUTs appear to need a known starting point. N’est ce pas? BOLB (bear of little brain) here…

anon11264400 · March 25, 2018, 5:39pm

Digital sensors are pretty horrible at dynamic range. At the highest quality end for a still image, 14.5 stops is more or less a sensor ceiling currently. More than likely, we are talking about 12.0-13.0 from typical DSLRs.

Remember that film was far superior at dynamic range, and the more modern Kodak Vision3 retained approximately 16.0 - 17.0 stops of density data if one were to mine it out of the negative properly. Under this thinking, it isn’t unreasonable to need to extend a single DSLR photo to meet a roughly film-like dynamic range.

When we are talking about properly merging photographs under a scene referred model, the above proper scaling applies. The creative output of such, such as the exposure techniques of Ansel Adams etc., would be applied on the scene referred data using scene referred ratios.

If we flub up that ratio component, the manipulations like overs, blurs, etc. all will begin to fall apart. Even though the output might be strictly creative, we need to do so in a way that holds the roughly radiometric energy ratios in the reference model, as creative as they might be.

Bingo. So does every “raw” viewer, across all software. The 0.18 is simply a terrific anchor point convention, from view transforms all the way down to properly aligning photograph energy ratios.

[1] Sony’s Venice notwithstanding, which claims to have 15+ stops.

age · March 25, 2018, 5:48pm

Uhmmm, I thought in the -2 EV shot middle gray is yet at 0.045 and this could be called camera referred, because in scene-referred middle gray is “always” 0.18.
So the need to apply a digital exposure compensation (+2)to the -2 EV shot…

anon11264400 · March 25, 2018, 6:00pm

The camera, if you properly exposed the bracketed shots individually, would each have similar anchor points relative to the code values in their files. That is, the “properly exposed code value” would represent different radiometric levels despite being encoded to the same code value.

Imagine a photograph with two lamps on two different grey cards on the left and right of the frame, A on the left and B on the right. Lamp B’s lamp is 6 stops hotter on its respective grey card.

Spot metering Lamp A’s grey card, and exposing per the grey card, we could call the resulting encoded value in a linear file CV.A.
Spot metering Lamp B’s grey card, and exposing per the grey card, we could call the resulting encoded value in a linear file CV.B.

The code values here CV.A and CV.B would be equivalent respective to each encoded file, yet relative to the scene energy, they would be representing different values.

If Lamp B’s grey card data in Lamp A’s photograph exceeded the dynamic range of Lamp A’s capture, and we wanted to properly merge the grey card data of Lamp B’s into Lamp A’s capture range we would:

Take the scene referred values from Lamp A’s linearly captured file CV.A and linearly scale them such that CV.A lands at 0.18 in our scene referred reference space.
In another instance, take the scene referred values from Lamp B’s linearly captured file CV.B and linearly scale them such that CV.B lands at 0.18 in our scene referred reference space.
Further linearly scale the resulting values in Lamp B’s capture by 2^6 * CV.B
Replace the blown out grey card on the right in Lamp A’s data with Lamp B’s adjusted values[1].

The total dynamic range between the two grey cards considered in isolation would be 0.18 to 11.52 (2^6 * 0.18).

Apologies for dragging this thread into the nuanced dumpster of merging, but alas.

[1] I suspect that log-like encodes will become ever more standard practice to store encoded values in integer format. Linear encoded integers in TIFFs for example, can only encode 16 stops of data total, with progressively worse encoding at the lower end. Log-like camera encodings allow for more uniform distribution of stops across bits, as well as code value meaning, which would make this mental experiment far easier. We would simply load the file, invert the log-like encode, and we would have scene referred values complete with known code values. But alas…

snibgo · March 25, 2018, 7:23pm

There are only two integers in the range of 0 to 1. Did you mean floating point numbers in the range of 0 to 1? Or something else?

gez · March 25, 2018, 7:33pm

Yes, you’re right of course. My bad.
I meant a capture of integers mapped to the 0,1 range in floating point precision, being 1 equivalent to the maximum possible value captured from the scene by the device, in the example at 12 stops of distance from the minimum value captured from the scene.
In that form, even being linear and conserving the light ratios it’s data referred to the device limits and not the scene, until you scale it.

Carmelo_DrRaw · March 25, 2018, 8:30pm

I disagree with @anon11264400 here.

let’s rephrase the example: you have two shots, one taken at “0EV” by spot-metering on a 18% gray patch, the second one over-exposed by +1EV to record more shadow details.

How to you merge them? One has to make sure that in the overlap region both images have values vary close to each other, otherwise one would obviously see a discrepancy.

Note that the pixels in 18% grey patch of the +1EV shot have values that are twice as big as in the first shot (in linear encoding, and assuming that the sensor is perfectly linear and only linear adjustments have been applied, like WB).

The easiest thing to do in order to match the two images is scale-down the second one by -1EV, i.e. divide all the values by 2.
That’s exactly what I did to create my blended image, by with 5 images instead of two.

I have uploaded all files, including the PhotoFlow configuration that does the actual blending (blend.pfi) here: https://filebin.net/4e5im82770zwwipz.

The merged file in linear Rec.2020 is called blend-rec2020.tif

The following two screenshots show the result of over-exposing either the darkest shot or the blended image by +5EV. As one can see, apart from the reduced noise in the second case, the overall tonal values are equivalent:

Notice that in such a scene I would not really know where to place a reference 18% gray patch, whether in the sunlit portion in the background or in the forwground shadows…

anon11264400 · March 25, 2018, 9:12pm

You have restated what I outlined.

With that said, the ratios of data in your file do not reflect the ratios as they would appear in a scene referred file.

Did you apply your stop adjustments on the scene referred original linear data, or somewhere else in your stack?

To solve this problem, please post exactly two of the widest brackets, in linear with REC.709 primaries. That is, off of your raw data, dcraw -T -4.

This conflates a creative question with a technical code value need. Within any given encoding, some code value will correspond with a middle grey value. This isn’t about creative choices, this is about anchoring energy relative to a single encoded capture.

EDIT: I see you have put your raw files in the download. Will try to take a look when I get a chance.

EDIT#2: I did a very, very, very hasty silly merge based off of the exif data in the two most extreme images, so you’ll see a seam! It seems they were both shot at the same aperture, with a spread of +4 EV between 7869 and 7872 via shutters of 1/8th and 1/120th respectively. I took the spread of values and merged them between the shadowy foreground and the hot background.

Here is a hasty render with a generic looking view, with no creative tweaking other than the basic view itself:

Here is a closeup of one of higher values from the direct sun houses as well as a lower value from one of the darker emissions in the shadow of the window:

hot_value
cold_value

To sum up, even with this hasty job, the low value ends up around 0.00354 with the high 16.0000. Aka nowhere near the ratios in the output you linked to. This is easy to calculate if we compare the values in the reference space. In the case of the merge I just made, the total stops of coverage would be log2(16.0) - log2(0.00354) == 12.1420349244. Comparing against the linear sRGB TIFF you provided we get log2(2.68075) - log2(0.08118) == 5.04536853322. Hence what I was getting at when I suggested the values were too compressed, as it is clear that there are greater than five stops between the hot faces of the houses and the deepest shadows of the door.

Direct link to merged linear REC.709 based EXR: new_7869_7872.exr - Google Drive