Exposure Fusion and Intel Neo drivers

I’m out of here. Keep fixing tools working in broken models as much as you want. When you will have to pull a Firefox out of your Netscape, you will understand why it was a bad idea in the first place.

Fix models, not tools.

2 Likes

It’s been a long time since I’d read it, I forgot that they had a fairly long treatise on colorspace management in there.

Of interest - it appears they handle FP images very differently from others (see the log transform), but it isn’t clear as to exactly what they do beyond that. It may explain your unexpected results. As far as gamma encoding, they only briefly mention it once. However, replacing equation 6.1 with 6.3 is by far the easiest way to describe what my changes to the fusion pipeline do.

As to the comments at the bottom of page 70 (their page numbering, not PDF page numbering), an analysis of the real-world impact of that is a large portion of the goal I had when generating the blending gradients I’ve used as examples. Note that at the beginning of section 6, they state: “Here, we collectively speak of blending and do not distinguish fusing, for the basic operations are the same.” - while the basic operations may be the same, in terms of the behavior of input data, they are not. Blending of different images (such as panoramic stitching) is a more “general” set of use cases where the two input pixels might have different color. In the case of exposure fusion where all exposures are generated from the same input, we’ll always be blending the “same” color. (again, assuming that the user doesn’t break other parts of the basecurve module and/or someone, likely myself, gets Edgardo’s fixes working inside the fusion pipeline)

@aurelienpierre Partly why I don’t use enfuse much anymore. However, when I do, I know what it is and what I want to do with it.

That’s how you show you understand nothing to what you are doing. There are no colours at this place in the pipe. Colours exist only in human brain. Mapping the RGB tristimulus (which encode light emissions, aka spectra reduction to a flat vector) as recorded by the sensor to human-related colours (XYZ space) is the job of the input profile (which still does it awefully bad, but that’s hoz it was done 10 years ago…), and it happens in the pipe after base curve.

I have put ressources in darktable’s wiki for developpers who code faster than light: Developer's guide · darktable-org/darktable Wiki · GitHub

This one especially https://www.visualeffectssociety.com/sites/default/files/files/cinematic_color_ves.pdf will explain to you why your careless use of the “gamma” word (and encoding…) is damaging.

Also: https://medium.com/the-hitchhikers-guide-to-digital-colour/the-hitchhikers-guide-to-digital-colour-question-1-what-the-f-ck-happens-when-we-change-an-rgb-b47e70582e8b

TL;DR gamma is the electric-optical transfer function of CRT screens. ICC clowns found clever to reuse the same word to describe an integer encoding system meant to prevent quantization artifacts (aka banding effect, aka posterization) in 8 and 16 bits integer files, even though it has nothing to do with the original gamma and only participated in putting a mess into everyone’s head. But “gamma encoding” is very clear on the fact it’s an exchange format that needs to be decoded before actually working on the pixels. But not all the IEEE geeks in universities are aware of that, it seems, and obviously none of them pushed pixels seriously in their lives. And now, gamma refers loosely to any kind of exponent transfer-function intended to raise mid-tones while not affecting black and white (that’s called lightness adjustment), or, worse, any attempt to encode values perceptually (ever heard of EVs ? It’s log2, not whatever exponent, and – yes – log(0) is a problem because black does not exist outside of black holes, so null energy can’t be found on Earth and it’s not a real problem then).

Because, again, RGB encode light emissions, so everything you do in a pipeline should use light-transport models, aka numerical simulations of what would have physically happened if you had played with actual light on the real scene, thus scene-linear encoding, proportional to light energy. As it happens, pushing pixels in perceptually encoded spaces fails miserably for reasons well understood (by most, at least). That is common knowledge amongst VFX and colour studios, but for some reasons, hobbyists find “intuitive” to push pixels in perceptual spaces (cause that’s how we see, right ? Well, photons don’t care). Theory is the only small thing we can hang on to, violate it and, you may not notice it immediately, but it will break in your face sooner than later.

Not that I hope it will change your mind, but I spent the last year trying to reeducate people here about best practices in colour handling and reasonable workflows, and you are spreading fake knowledge and pushing us 10 years back, where gamma issues disappeared in the 8 EV dynamic range of average DSLRs, so it was kind of ok.

Final note, of all the gamma-encoded RGB spaces to work with, the dumbest is probably sRGB, because the gamma is actually a linear function under a threshold, so doing blending in that non-uniformly spaced thing is the sillyest thing I heard this week.

2 Likes

@anon41087856 @Entropy512
If you allow me, I would like to contribute my 2cts to this discussion, even though I am not at all a color scientist…

Let me begin by speculating on this statement:

RGB values are one possible representation of color, even when they directly represent the light emission values as directly recorded by the camera sensor.

RGB values represent some color. Which color exactly is undefined, unless you associate to them a set of RGB primaries and a white point. If you have that, then an RGB color in camera colorspace is as valid as a colors in Rec.2020. Colors are present at each stage of the pipeline, they are just encoded differently.

Then the only things you are allowed to do are basically exposure compensations and white balance corrections. Any other non-linear adjustment, including filmic curves, do not have a physical counterpart, because for a given scene illumination you cannot physically change the relative strength of your light diffusion in shadows and highlights, right?

Notice also that exposure compensation and WB work equally well on RGB values that are encoded with a pure power function (as already mentioned by @Entropy512), therefore linear encoding is not strictly needed from a mathematical point of view, at least in this case.

I totally agree with you, but a safer rephrasing of the above sentence might be we’ll always be blending RGB triplets that have the same channel ratios, because they are all obtained from the same initial RGB triplet by linear scaling (aka exposure compensation). I do not know the internals of DT’s base curve module, but this might only be valid if the base curve is a straight line…

Now let me try to give you my point of view on some of the facts an myths about linear vs. “gamma” encodings…

Color shifts
When you apply a tone curve to the RGB values, color shifts are generally not a consequence of whether the RGB values are linear or not. If a non-linear tone curve is applied to the individual RGB channels, color shifts are always there.
The usual “workaround” to avoid this is to apply the tone curve to some RGB norm (like luminance), compute the out/in luminance ratio, and then scale the RGB values by this ratio.
This is where the need of linear RGB values comes into play, because:

  • luminance must be computed from linear RGB values
  • the RGB scaling by the luminance ratio is equivalent to an exposure scaling only if the RGB values are linear (or encoded with a pure power function, but there is no advantage in doing that in this specific case)

When linear is visually better
One domain in which AFAIK linear encoding is truly needed is pixel blending.
Here is a classical example of blending a pure red (on the left) and a pure cyan (on the right) color at 50% opacity (in the middle). First image is obtained in sRGB encoding, second one in linear:

I guess we all agree that the second one looks “more correct”.

The same happens when downscaling images, because downscaling involves blending neighbouring pixels. Downscaling in linear encoding is better.

When linear encoding is NOT good
One clear case in which linear encoding is not “good” from the practical point of view is when building luminosity masks. For example, one would expect that a basic luminosity mask has an opacity of 50% where the image roughly corresponds to mid-grey, right? However, mid-grey is 18% in linear encoding…

The relationship with enfuse
Correct me if I am wrong, but in a simplified form and for two images, the exposure blending boils down to something like

(1-W)*x_{1} + W*x_{2}

where x1 and x2 are the the RGB channels of images 1 and 2 respectively. W is a factor that gives more weight to “well exposed” pixels, right? And W is derived starting from some RGB norm, like luminance, right?

To me, this looks really like blending through luminance masks, which can be split in two parts:

  • the determination of the weights W, for which perceptual encoding seems to be more appropriate, since you want your weight function to be peaked at 50% and perceptually symmetric around mid-gray, right?
  • the blending of RGB values weighted by W, for which linear encoding is more appropriate.

Therefore, I wonder if the following approach would yield correct results:

  • take linear RGB values as input
  • compute the RGB luminance, and encoded it with a power function with exponent \gamma = 1/2.45, so that mid-grey is roughly mapped to 0.5
  • use the power-encoded luminance to compute the weights
  • blend the linear RGB values with the obtained weights

Would this make sense?

P.S: this answer has nothing to do with the initial issue with Intel Neo drivers, so do not hesitate to split is to a separate thread if you think it would be better!

3 Likes

Carmelo - VERY well written, thanks!

Looks good to me.

Yup. It sounds like Pierre works in the cinema industry, and one thing to keep in mind is that in the majority of cinematic productions, a major part of the production is controlling the actual light present in the scene with modifiers (scrims, reflectors, etc) and additional lighting. This means some of the extreme dynamic range managment tricks such as exposure fusion and one of your approaches at PhotoFlow - new dynamic range compressor tool (experimental) - #36 by afre aren’t nearly as necessary, if at all. (Nice approach by the way, I think my next task is to take a look at DT’s implementation of that approach and figure out why it underperformed - FYI fixing the poor performance of enfuse in highlights that you show as an example in that thread is exactly what I’ve been trying to do. The current implementation performs horribly in highlights)

Ideally you do this even in photography - but sometimes you’re hiking on a trail up multiple flights of stairs and merely putting your camera and tripod in the backpack has already increased your burden significantly. There is no way I’m bringing a monobloc and scrims/reflectors on the trail!

So we’re left with the problem of a scene with a very high dynamic range, a camera that can capture it if you expose to preserve highlights, but stuck with the lowest common denominator of a typical SDR display. So the challenge is to not make things look like crap on such displays, even if what is displayed is now at best a rough approximation of the real scene.

Of note, a lot of these problems go away with a wide-gamut HDR display. As an experiment, I’ve exported a few of the images I find need exposure fusion to linear Rec. 2020 RGB, and then feed this to ffmpeg to generate a video that has the Hybrid Log Gamma (HLG) transfer curve and appropriate metadata. The result looks AMAZING without any need for exposure fusion at all in most cases. Sadly, for still images, we have no good way to deliver content to such displays, even of those displays are getting more common. However, if you ever expect your content to be viewed on a phone or tablet, 99% of them are SDR displays and it’s going to be that way for many years to come. :frowning: Which happens to be why you’ll have to trust my word that exporting to HLG looks gorgeous on a decent HLG display - there’s simply no way to convey that visually though this forum software to the displays that 99% of readers here have.

Yup, and the math behind this is the identity (ax)^y = a^yx^y as mentioned before.

Yeah, that’s a better way of wording it.

This is, as far as I can tell, the fundamental reason Pierre hates basecurve so much. However, this issue with basecurve was fixed:

It happens that it was not fixed for the fusion flow (a result of two code paths getting branched, apparently to eliminate a single multiply operation on the “fast” branch ages ago)

I’ll be submitting a pull request later today that reorganizes some of these code paths such that Edgardo’s changes can be used in combination with fusion.

Side note: Getting to the science vs. art discussion I mentioned previously, in some cases such chromaticity shifts actually look really nice. For one particular sunset example, applying the “sony alpha like” transfer curve in the old per-channel way gives a much more “fiery” look to the clouds. Is it in any way a correct representation of the physical realities of the scene? Not at all. Does it look impressive? Yup. Obviously, this is the sort of thing that should be used with caution and should not be the default behavior (which is indeed the case going forward in DT)

Yup, no issues with this. All of the gradient examples I posted were of blends between two pixels that were linear scalings of each other (e.g. channel ratios are constant). If channel ratios aren’t constant, things get funky.

Yup, and this is a major part of why DT’s current exposure fusion workflow tends to pull everything into the highlights and then crush the highlights. It also pulls quite a bit up past the point at which it’ll clip later in the pipeline.

Exactly! The equation you give is in the enfuse manual as equation 6.1. Alternatively, the manual gives a second equation (6.3), which is ((1-w)x_{1}^(1/y) + wx_{2}^(1/y))^y (effectively, what I did in darktable’s fusion implementation is to replace equation 6.1 with 6.3, where y = 2.4)

Yup!

I’m not so sure of that. Let’s take the extreme example of blending black with white, with w = 0.5

So x_{1} = 0, x_{2} = 1.0, w = 0.5

Plug this into your equation (corresponds to equation 6.1 in the enfuse manual), and you get 0.5

Plug this into equation 6.3 with y = 2.4 and you get something around 0.18

So perceptually, the blending in linear space gives a result that is significantly brighter than the perceptual midpoint between the two inputs. You can see this in one of the orange gradients I posted.

That was, in one of the examples I posted above, described as “weight in gamma-space, blend in linear”. The end result was an image that was very bright and washed out. Better than the current lin/lin approach, but still not visually pleasing. Someone posted an example of applying a power transform to that image to make it look much better, that was the case where I responded that doing so was one of the cases where a chromaticity shift could occur (see the gradient I posted with two different shades of orange).

I don’t think anyone has talked about that in a while, and I don’t expect any more conversation on that particular topic outside of common/opencl_drivers_blacklist: Only blacklist NEO on Windows by Entropy512 · Pull Request #2797 · darktable-org/darktable · GitHub - yes, I submitted a pull request to un-blacklist Neo on non-Windows platforms since it appears that the root cause of failures on Linux was identified and corrected. OpenCL + NEO is working great on my system.

Before any attempt to rematch them to XYZ, camera RGB vectors are just representing how many photons went through each filter array. Conceptually, it’s not even color, it’s a spectrum subset.

Yes, you can change the relative strength of shadows/highlights, by applying selective exposure compensations and masking the picture.

Yes, but the base curve module discussed here does any kind of transfer function, so from an implementation point of view, you can’t expect its output to be linear in general, or even pure-power.

No. Mid-grey is 18% assuming 100% is diffuse white luminance and the picture has 4.94 EV of dynamic range (\log_2(1 / 0.18) = 2.47 EV). That’s why no algorithm should do a fixed assumption on the mid-grey value, because modern cameras range between 10 and 14 EV at 100 ISO. Beside, we don’t put diffuse white at 100% anymore. But this convention is still applied by people, even though they don’t recall where it comes from. Luminance has no magic numbers. Sensors have a noise and saturation threshold, in-between, you decide what means what.

This is also what I show in my red/cyan example above. But AFAIK that is correct: if you blend black and white at 50% you do not get mid-grey, instead you get 50% light diffusion, which is brighter than mid-gray.

Could this be due simply by a wrong double-application of a power function somewhere? I am quite surprised by this brightness mismatch…

I guess it depends on what your criteria of “correctness” is - is your goal physical correctness, or perceptual correctness? Since we’re trying to create the illusion of a very high dynamic range scene that looks good on a standard dynamic range display, I think that perceptual correctness happens to be the appropriate approach here. Obviously that’s not always the case, but in this case that seems to be.

Fundamentally, by trying to cram the 12-14EV of dynamic range of a modern camera into the much lower dynamic range of a typical SDR display, you are no longer preserving physical reality. In an ideal world we could preserve physical reality and everyone would own a display exceeding the capabilities of something like my Vizio P65-F1, but we’re many years from such displays being the standard as opposed to a rare exception. Also, as I’ve mentioned, we’re even farther off from having the content delivery pipeline to reliably get still images to such displays.

I’ll try a few more experiments later today, but I’m 90% certain that I merely moved the apply_pow() call with a power of 2.4 from the end of the fusion pipeline to immediately after the weight calculation.

That one poster effectively did the additional power transform, which made luminance look better but it’s highly likely in that case that chromaticity did shift.

Hence filmic, placed correctly in the pipe, and matching whatever exposure to [0 ; 100] % display

2 Likes

At this place of the pixelpipe, you need physical correctness because some modules coming after need it. You don’t cross the wall of the non-linearity before modules that apply physically-defined operations (low-pass, highpass, blur, sharpening, colour manipulation, etc.). You are dealing with pixel operations as if they were isolated filters, but it’s a full pipeline. The only place where you can do perceptually-defined ops is where you are sure no pixel-pushing needs physics. Which, again, is not at the basecurve level.

3 Likes

I want to mention a few things, not necessarily in the order they were presented.

1 Camera RGB is not exactly scene-referred or -linear, even after internal pre-processing, and external profiling and calibration. That would be an ideal case. No model is perfect. In that sense, it represents somewhat abstracted “colours”. That and there is already an internal colour profile in anticipation for the next transform.

2 Since sRGB has a toe, it is not a pure power and so may be troublesome to work with mathematically. Adding to the confusion, as has been discussed in this forum, sRGB comes in a dizzying amount of variants. No two sRGB are exactly the same! Certain sRGBs are more well-behaved than others.

3 The nature of the display should not matter. It is up to the software to colour manage, although there may be limitations there due to implementation and understanding of the standards.

4 Diffuse white seems arbitrary to me. HDR standards and algorithms often assume it. I think diffuse white is something that the photographer or videographer needs to determine while shooting or filming, having the whole post-processing pipeline, collaboration and final product in mind.

What I am saying is that reality is much more complex than most of us are comfortable with. I find that much disagreement has to do with focusing too much in your own corner of expertise. Of course, it doesn’t mean that there are some things that are definitely wrong or right in the given context. Carry on! I love these conversations.

3 Likes

Yup, and nitpicking exactly where this is gets to be highly unproductive.

Interestingly, I’ve seen the justification for that linear toe being some sort of mathematical/processing challenge. The exact nature of that “challenge” isn’t really specified in those references I’ve seen.

At least in my current implementation, I was lazy and didn’t implement the linear toe. It’s a lot easier to just apply the power function. I was considering moving to a “standardized” sRGB colorspace that permitted leveraging internal darktable transform functions, but general consensus seems to be to forget the toe. I’m fine with that - less work for me!

CMS software is good for ensuring that what you’re looking at is consistent with what your users will see, and to at least provide a consistent and founded-in-reality starting point, but past that - when your scene and camera simply can’t be realistically represented by a common display, if you don’t have the luxury of controlling your scene like the cinema industry usually does, you have to resort to alternative tricks.

Yeah. Certain people here want to speak in absolutes - and continue to do so even when the literature they cite most definitely does not speak in absolutes. For example (bold emphasis mine):

“In the color community, the processes of pleasingly reproducing high-dynamic range pixels on a low dynamic range displays is known as tone mapping, and is an active area of research. Surprisingly, many pleasing tonal renditions of high-dynamic range data use similarly shaped transforms. On first glance this may be surprising, but when one sits down and designs a tone rendering transform there are convergent processes at work corresponding to what yields a pleasing image appearance”

Now the author of that paper is definitely a fan of the chemical film S-curve for these purposes, but they outright state that as personal preference/opinion and not hard undisputable fact.

It’s obvious that certain people hate the enfuse algorithm for whatever reason - but the reality is that its popularity derives from its robustness and tendency to deliver “natural looking” results (obviously that is subjective) under a wide variety of conditions. After all, a variation on the Mergens algorithm (preceeded by a really slick tile-based alignment and merging approach) is the default operating mode of the camera subsystem of a product with annual sales in the millions of units - http://static.googleusercontent.com/media/hdrplusdata.org/en//hdrplus.pdf

A model is not reality, and everyone knows models are imperfect. However, some are more than others, and camera RGB can be corrected to be scene-linear.

Depends on what parts of the pipe you are talking about. CMS are far from perfect, their first flaw being they aim at matching output to output (screen with other screens and prints), but don’t care about retaining the input (matching original scene colours with output), and loosely try to match with the closest colour in gamut.

Diffuse white luminance is the luminance of a 20% reflective white patch under diffuse lighting (no glare). It’s usually assumed to be 100 Cd/m² on display.

Processing challenge for display. This has to do with noise level in given surrounding lighting.

Wrong. See above.

The very need of tricks mean your model is broken in the first place. You don’t need do patch things if they work in the proper space. In 1911, everyone tried to patch their physics models to get them to work because they didn’t match experiments, under the hypothesis that time was constant. Until one guy came and questionnated the very assumption that made everything break: maybe time was relative. When you end up piling fixes on top of patches, maybe that’s a sign that the fundamentals are broken.

And since it’s an active area of research, you choosed to fix a 12 years-old algo instead of looking at the state of the art in 2019 (which I happen to follow closely).

Yeah, as is all Hollywood… Maybe for a reason. About that author:

A lot of broken algos deliver fair results as long as you are lucky and don’t push them too far. That’s what happens when they break that needs attention. Maybe try to understand those reasons for hate before skipping to conclusions. I have put 14500 lines of code inside darktable’s core for the past year, I have dealt already with all the problems you just seem to discover, and solved some of them.

Yeah, and Photoshop default image-processing mode is gamma encoded (you can’t even use healing brush in 32 bits float linear mode). That doesn’t make it right. Legacy is a stupid burden to carry, and after a while, nobody remembers why it was done like that at the origin, but everyone’s suppose those guys knew what they did, so maybe there was some sense in it, and keep going. (And there was some sense, when you worked with 5 EV film scans).

2 Likes

This is where I struggle. In the ‘olden-days’ (2012?), if I wanted to make an image rendition that looked even just okay on whomever’s display sans color management, I’d just gamut-transform and TRC to sRGB which seemed to be the ‘least common denominator’. So now, HDR displays, what works? i just can’t believe all that buy these things are using color-managed software and calibrated display profiles…

And, that’s the main reason I haven’t spent quality time with the OCIO library - it doesn’t help me embed color and tone information in the image metadata for those who do color-manage their display. The only standard I can find for that link is ICC.

Overall, it seems like TV manufacturers are doing a much better job of calibrating their displays with HDR units… Obviously nothing is perfect, but overall, the situation is much better.

As to “color managed software” - that gets to my complaints earlier about the state of delivering content to HDR displays. If you feed an HDR TV H.265 video with the appropriate metadata, it’ll do a pretty good job (assuming the user didn’t muck with their settings). At least I’ve seen far less apparently variation between HDR TV units in the field than I have between SDR displays.

That said, there might be units who degrade when you give them a tough scenario moreso than others. The HDR10 standard includes metadata for the maximum light level encoded with a stream, and the maximum frame-average light level - this lets the TV adjust backlight settings to keep within power consumption and thermal envelopes. (Many HDR TVs can deliver very high peak brightness, but only for a small percentage of the display at any given time.) HDR10+ primarily improves by allowing dynamic adjustment of the MaxCLL and MaxFALL midway through a stream. I have always had some concerns that HLG lacks this informative information, and thus there could be cases on lower-end displays where it degrades less gracefully. Either way, you have FAR fewer constraints than with an SDR display.

If you want to deliver any sort of content to an HDR display other than H.265 video - good luck with that. Windows support for HDR displays is highly immature and buggy, Linux support for HDR displays is effectively nonexistent. Intel was pushing some improvements to the kernel HDMI code to start moving towards an HDR-capable display subsystem (they had a good presentation on compositor approaches a few years ago - sorry, don’t have time to dig up a link now), but we’re a long way away. :frowning:

I’ve seen implications that HEIC/HEIF has support for HDR content delivery of stills, but the infrastructure and software support is basically not there to get that imagery data to the display.

So, a side thought as a possible compromise approach that might handle some of the perceptual weighting issues I mentioned, instead of using equation 6.1 from the enfuse manual (as you’ve restated above), or equation 6.3 (which I’ve restated in other posts and am currently using):

(1-W^γ)*x_{1} + W^γ*x_{2}

might be worth evaluating… I’ll try and poke at this approach later this week.

1 Like

Oh God :man_facepalming:

Outside of scene-referred lives pixel garbage. Amen.

It’s not as if we already had all these discussions one year ago…

Not once have I made any claim that the result of exposure fusion is in any way scene-referred…

It looks like, yes, there was a discussion on enfuse a year ago, and the general consensus was “enfuse works” - in fact the final post in that thread is the updated enfuse script (which does work - but is slow since you need to manually generate exposure-shifted versions of your input image before feeding them to the lua script)

(Unless you’re referring to the fact that enfuse didn’t barf on linear input - see previous discussion on enfuse documentation.)

Yes, but… this part of the pipe expects scene-referred RGB. So it’s a mistake to do the scene → display conversion here. It’s just too early in the pipe.

If you have some spare time, maybe work on making it faster in UI. The base curve module will be deprecated in the future, because it’s a leftover of darktable 0.4 or so, which used libraw, and it’s conceptually flawed from the beginning. Piling up fixes on top of patches on this particular module is lost time. Start clean, with proper colour models, at the right place in the pipe.

1 Like