Exposure Fusion and Intel Neo drivers

Before any attempt to rematch them to XYZ, camera RGB vectors are just representing how many photons went through each filter array. Conceptually, it’s not even color, it’s a spectrum subset.

Yes, you can change the relative strength of shadows/highlights, by applying selective exposure compensations and masking the picture.

Yes, but the base curve module discussed here does any kind of transfer function, so from an implementation point of view, you can’t expect its output to be linear in general, or even pure-power.

No. Mid-grey is 18% assuming 100% is diffuse white luminance and the picture has 4.94 EV of dynamic range (\log_2(1 / 0.18) = 2.47 EV). That’s why no algorithm should do a fixed assumption on the mid-grey value, because modern cameras range between 10 and 14 EV at 100 ISO. Beside, we don’t put diffuse white at 100% anymore. But this convention is still applied by people, even though they don’t recall where it comes from. Luminance has no magic numbers. Sensors have a noise and saturation threshold, in-between, you decide what means what.

This is also what I show in my red/cyan example above. But AFAIK that is correct: if you blend black and white at 50% you do not get mid-grey, instead you get 50% light diffusion, which is brighter than mid-gray.

Could this be due simply by a wrong double-application of a power function somewhere? I am quite surprised by this brightness mismatch…

I guess it depends on what your criteria of “correctness” is - is your goal physical correctness, or perceptual correctness? Since we’re trying to create the illusion of a very high dynamic range scene that looks good on a standard dynamic range display, I think that perceptual correctness happens to be the appropriate approach here. Obviously that’s not always the case, but in this case that seems to be.

Fundamentally, by trying to cram the 12-14EV of dynamic range of a modern camera into the much lower dynamic range of a typical SDR display, you are no longer preserving physical reality. In an ideal world we could preserve physical reality and everyone would own a display exceeding the capabilities of something like my Vizio P65-F1, but we’re many years from such displays being the standard as opposed to a rare exception. Also, as I’ve mentioned, we’re even farther off from having the content delivery pipeline to reliably get still images to such displays.

I’ll try a few more experiments later today, but I’m 90% certain that I merely moved the apply_pow() call with a power of 2.4 from the end of the fusion pipeline to immediately after the weight calculation.

That one poster effectively did the additional power transform, which made luminance look better but it’s highly likely in that case that chromaticity did shift.

Hence filmic, placed correctly in the pipe, and matching whatever exposure to [0 ; 100] % display

2 Likes

At this place of the pixelpipe, you need physical correctness because some modules coming after need it. You don’t cross the wall of the non-linearity before modules that apply physically-defined operations (low-pass, highpass, blur, sharpening, colour manipulation, etc.). You are dealing with pixel operations as if they were isolated filters, but it’s a full pipeline. The only place where you can do perceptually-defined ops is where you are sure no pixel-pushing needs physics. Which, again, is not at the basecurve level.

3 Likes

I want to mention a few things, not necessarily in the order they were presented.

1 Camera RGB is not exactly scene-referred or -linear, even after internal pre-processing, and external profiling and calibration. That would be an ideal case. No model is perfect. In that sense, it represents somewhat abstracted “colours”. That and there is already an internal colour profile in anticipation for the next transform.

2 Since sRGB has a toe, it is not a pure power and so may be troublesome to work with mathematically. Adding to the confusion, as has been discussed in this forum, sRGB comes in a dizzying amount of variants. No two sRGB are exactly the same! Certain sRGBs are more well-behaved than others.

3 The nature of the display should not matter. It is up to the software to colour manage, although there may be limitations there due to implementation and understanding of the standards.

4 Diffuse white seems arbitrary to me. HDR standards and algorithms often assume it. I think diffuse white is something that the photographer or videographer needs to determine while shooting or filming, having the whole post-processing pipeline, collaboration and final product in mind.

What I am saying is that reality is much more complex than most of us are comfortable with. I find that much disagreement has to do with focusing too much in your own corner of expertise. Of course, it doesn’t mean that there are some things that are definitely wrong or right in the given context. Carry on! I love these conversations.

3 Likes

Yup, and nitpicking exactly where this is gets to be highly unproductive.

Interestingly, I’ve seen the justification for that linear toe being some sort of mathematical/processing challenge. The exact nature of that “challenge” isn’t really specified in those references I’ve seen.

At least in my current implementation, I was lazy and didn’t implement the linear toe. It’s a lot easier to just apply the power function. I was considering moving to a “standardized” sRGB colorspace that permitted leveraging internal darktable transform functions, but general consensus seems to be to forget the toe. I’m fine with that - less work for me!

CMS software is good for ensuring that what you’re looking at is consistent with what your users will see, and to at least provide a consistent and founded-in-reality starting point, but past that - when your scene and camera simply can’t be realistically represented by a common display, if you don’t have the luxury of controlling your scene like the cinema industry usually does, you have to resort to alternative tricks.

Yeah. Certain people here want to speak in absolutes - and continue to do so even when the literature they cite most definitely does not speak in absolutes. For example (bold emphasis mine):

“In the color community, the processes of pleasingly reproducing high-dynamic range pixels on a low dynamic range displays is known as tone mapping, and is an active area of research. Surprisingly, many pleasing tonal renditions of high-dynamic range data use similarly shaped transforms. On first glance this may be surprising, but when one sits down and designs a tone rendering transform there are convergent processes at work corresponding to what yields a pleasing image appearance”

Now the author of that paper is definitely a fan of the chemical film S-curve for these purposes, but they outright state that as personal preference/opinion and not hard undisputable fact.

It’s obvious that certain people hate the enfuse algorithm for whatever reason - but the reality is that its popularity derives from its robustness and tendency to deliver “natural looking” results (obviously that is subjective) under a wide variety of conditions. After all, a variation on the Mergens algorithm (preceeded by a really slick tile-based alignment and merging approach) is the default operating mode of the camera subsystem of a product with annual sales in the millions of units - http://static.googleusercontent.com/media/hdrplusdata.org/en//hdrplus.pdf

A model is not reality, and everyone knows models are imperfect. However, some are more than others, and camera RGB can be corrected to be scene-linear.

Depends on what parts of the pipe you are talking about. CMS are far from perfect, their first flaw being they aim at matching output to output (screen with other screens and prints), but don’t care about retaining the input (matching original scene colours with output), and loosely try to match with the closest colour in gamut.

Diffuse white luminance is the luminance of a 20% reflective white patch under diffuse lighting (no glare). It’s usually assumed to be 100 Cd/m² on display.

Processing challenge for display. This has to do with noise level in given surrounding lighting.

Wrong. See above.

The very need of tricks mean your model is broken in the first place. You don’t need do patch things if they work in the proper space. In 1911, everyone tried to patch their physics models to get them to work because they didn’t match experiments, under the hypothesis that time was constant. Until one guy came and questionnated the very assumption that made everything break: maybe time was relative. When you end up piling fixes on top of patches, maybe that’s a sign that the fundamentals are broken.

And since it’s an active area of research, you choosed to fix a 12 years-old algo instead of looking at the state of the art in 2019 (which I happen to follow closely).

Yeah, as is all Hollywood… Maybe for a reason. About that author:

A lot of broken algos deliver fair results as long as you are lucky and don’t push them too far. That’s what happens when they break that needs attention. Maybe try to understand those reasons for hate before skipping to conclusions. I have put 14500 lines of code inside darktable’s core for the past year, I have dealt already with all the problems you just seem to discover, and solved some of them.

Yeah, and Photoshop default image-processing mode is gamma encoded (you can’t even use healing brush in 32 bits float linear mode). That doesn’t make it right. Legacy is a stupid burden to carry, and after a while, nobody remembers why it was done like that at the origin, but everyone’s suppose those guys knew what they did, so maybe there was some sense in it, and keep going. (And there was some sense, when you worked with 5 EV film scans).

2 Likes

This is where I struggle. In the ‘olden-days’ (2012?), if I wanted to make an image rendition that looked even just okay on whomever’s display sans color management, I’d just gamut-transform and TRC to sRGB which seemed to be the ‘least common denominator’. So now, HDR displays, what works? i just can’t believe all that buy these things are using color-managed software and calibrated display profiles…

And, that’s the main reason I haven’t spent quality time with the OCIO library - it doesn’t help me embed color and tone information in the image metadata for those who do color-manage their display. The only standard I can find for that link is ICC.

Overall, it seems like TV manufacturers are doing a much better job of calibrating their displays with HDR units… Obviously nothing is perfect, but overall, the situation is much better.

As to “color managed software” - that gets to my complaints earlier about the state of delivering content to HDR displays. If you feed an HDR TV H.265 video with the appropriate metadata, it’ll do a pretty good job (assuming the user didn’t muck with their settings). At least I’ve seen far less apparently variation between HDR TV units in the field than I have between SDR displays.

That said, there might be units who degrade when you give them a tough scenario moreso than others. The HDR10 standard includes metadata for the maximum light level encoded with a stream, and the maximum frame-average light level - this lets the TV adjust backlight settings to keep within power consumption and thermal envelopes. (Many HDR TVs can deliver very high peak brightness, but only for a small percentage of the display at any given time.) HDR10+ primarily improves by allowing dynamic adjustment of the MaxCLL and MaxFALL midway through a stream. I have always had some concerns that HLG lacks this informative information, and thus there could be cases on lower-end displays where it degrades less gracefully. Either way, you have FAR fewer constraints than with an SDR display.

If you want to deliver any sort of content to an HDR display other than H.265 video - good luck with that. Windows support for HDR displays is highly immature and buggy, Linux support for HDR displays is effectively nonexistent. Intel was pushing some improvements to the kernel HDMI code to start moving towards an HDR-capable display subsystem (they had a good presentation on compositor approaches a few years ago - sorry, don’t have time to dig up a link now), but we’re a long way away. :frowning:

I’ve seen implications that HEIC/HEIF has support for HDR content delivery of stills, but the infrastructure and software support is basically not there to get that imagery data to the display.

So, a side thought as a possible compromise approach that might handle some of the perceptual weighting issues I mentioned, instead of using equation 6.1 from the enfuse manual (as you’ve restated above), or equation 6.3 (which I’ve restated in other posts and am currently using):

(1-W^γ)*x_{1} + W^γ*x_{2}

might be worth evaluating… I’ll try and poke at this approach later this week.

1 Like

Oh God :man_facepalming:

Outside of scene-referred lives pixel garbage. Amen.

It’s not as if we already had all these discussions one year ago…

Not once have I made any claim that the result of exposure fusion is in any way scene-referred…

It looks like, yes, there was a discussion on enfuse a year ago, and the general consensus was “enfuse works” - in fact the final post in that thread is the updated enfuse script (which does work - but is slow since you need to manually generate exposure-shifted versions of your input image before feeding them to the lua script)

(Unless you’re referring to the fact that enfuse didn’t barf on linear input - see previous discussion on enfuse documentation.)

Yes, but… this part of the pipe expects scene-referred RGB. So it’s a mistake to do the scene → display conversion here. It’s just too early in the pipe.

If you have some spare time, maybe work on making it faster in UI. The base curve module will be deprecated in the future, because it’s a leftover of darktable 0.4 or so, which used libraw, and it’s conceptually flawed from the beginning. Piling up fixes on top of patches on this particular module is lost time. Start clean, with proper colour models, at the right place in the pipe.

1 Like

Got it - your primary objection at this point is that it’s at the wrong place in the pipeline?

I’m fine with that in the long term.

At the current point, getting decent results out of the existing exposure fusion module is low-hanging fruit - it’s basically done at this point with the exception of the rather nasty issue of legacy compatibility. The current method of handling/combining weights in the basecurve module is very ??? - it doesn’t match the Mertens paper even remotely and it doesn’t match the enfuse implementation either (side note, something in the enfuse implementation looks wrong to me as it also seems to diverge from the original Mertens approach in a way that would cause any nonzero weight to be effectively treated as being 1.0, I need to run some tests this weekend to confirm my suspicions…)

Pulling it out of basecurve (again, I didn’t make that decision, if you’ve got a problem with that aspect of this, take it up with whomever made that decision…) is a good future roadmap effort, and something I’ll try to take a look at after a few other things to address pain points in some of my current projects (making invert more usable is going to involve some serious pipeline ordering fun… Which would be a good learning experience on pipeline order management before a later “move fusion elsewhere” task).

Yep. That, and assuming grey == whatever fixed value. Just unroll the maths… For 18% to be middle grey (in linear encoding), you need \log_2(0.18) to be in the middle of the dynamic range. Thus, you need to have a 5 EV dynamic range. For modern cameras, that’s something you get around 3200 ISO. At 100 ISO, you get at least 10 EV, so expect your middle grey to be around 3%. How do you know, at compilation time, what the middle grey value is ? You can’t.

Plus, I think blending in anything else than linear spaces is making things more complicated than they should, and is asking for non-trivial side-effects. Linear spaces make things clean, simple and robust.

How can you predict that? It really depends on how the photographer exposed the picture. It could be 18% in the case of a low-contrast well-exposed scene, less if the picture was intentionally under-exposed to keep details in the highlights, or even more than 18% if the artist was aiming for an high-key look.

Let’s consider the specific case where the photographer decides to make full use of the camera’s dynamic range, and therefore takes an under-exposed picture that captures the full dynamic range of an high-contrast scene. IMHO in this case the best way to proceed with the image processing is the following:

  1. process the RAW file, retaining the full information
  2. apply a positive exposure adjustment that brings back middle grey at the right place. This will push highlights above 1.0, but that’s fine in a scene-referred context.
  3. compress the dynamic range with whatever method you prefer (filmic tone mapping, enfuse, etc…), to prepare the image for display on an SDR device

Therefore, any tool that implements step 3. can assume that middle grey is 18%. Usually one aims at preserving this pivot point, and adjust shadows/highlights to taste.

I find that the average image doesn’t have the dynamic range advertised by the camera companies because, every time you deviate from the optimal settings and framing, the range goes down accordingly. It is just that, for modern cameras, the drop in quality and range isn’t as great.

Conceptually, it’s not middle grey anymore if it’s not in the middle of your dynamic range. So, 2^{-5} = 3\%. But, indeed, you can’t predict how the photographer managed his exposure. That’s exactly my point : don’t make assumptions in the code.

It seems to me that this might go back to the root cause of “this is assumed to be operating in a scene-referred fashion due to its position in the pipeline”… That “exposure optimum” value is more of an output concept than one based on the starting image. After all, the whole stated idea of the original DT exposure fusion implementation is to have a workflow something like:

Intentionally underexpose to preserve highlights (which, as you say, puts “middle grey” waaaaay down)
Transfer to PC, start up darktable
Turn on exposure fusion in DT
Exposure fusion takes that exposed-to-preserve-highlights image, creates multiple copies with +EV exposure compensation, then selectively blends them into the output, with their weights being determined based on distance from the “exposure optimum” variable. (and yeah that variable may not be properly named, blame Mertens for that one…)

Obviously you can wind up in a very bad state if you’re outputting to a format where you don’t want to have highlights way above +1.0 in your output getting pulled “down” into that [0.0,1.0] range, such as outputting something intended to go to an HDR TV - but you likely don’t need/shouldn’t be using exposure fusion at all for such a scenario. Targeting HDR displays is a whole big other can of worms though, with lots of components missing throughout the entire content pipeline (especially for Linux users, Windows is definitely much farther ahead when it comes to HDR displays…)

Oh yeah, I FULLY agree that the “exposure optimum” (aka where the peak of the weighting function lies) should NOT be hardcoded. I have NO idea why this was hardcoded along with the width/variance but the EV per step and the step “bias” were not… Those variables have been exposed as sliders in the code I’m probably going to push tomorrow (since the patch that fixed lack of “preserve colors” in the fusion workflow has been merged, I can now push up the WIP)

That would be an ideal, although every attempt to do the blending in linear space I’ve tried so far has wound up giving really poor results - usually severe haloing. That may fall into the category of “work for the next generation module at the correct point in the pipeline” once I’ve gotten caught up a bit on a pile of images that still need processing… :slight_smile: I’m wondering if there is something strange lurking in DT’s implementation of the blending code that differs significantly from the Mertens paper/enfuse implementation that I haven’t found yet.

The work done by @Carmelo_DrRaw in another thread is a promising alternative and since I believe choice is good, I’m considering attempting to get that implemented in darktable sometime later this year. That does, I am fairly certain, do blending in linear space. Supposedly the “tonemap” module is attempting to do something similar, but it has basically not been touched other than iop order stuff since 2011-2012 and also appears to be delivering non-optimal results at the moment. (vs. exposure fusion which dates back to 2016-ish even if basecurve was much older? Again - throwing it into basecurve was not my decision!)