re: tiling. dt injects higher levels of the preview pipeline into the laplacian pyramid. the preview is full image, but coarse resolution. unfortunately it’s not aligned very well so occasionally you’ll see odd effects when zoomed in (region of interest cropping is similar to tiling in that sense). i don’t think this code path is triggered for tiled export though.
I assume he’s talking about dt’s particular implementation, which is limited solely to local contrast enhancement and doesn’t implement the algorithm in a way that performs tonemapping.
Local contrast preservation or enhancement is one particular component of tonemapping, but in the case of dt’s local laplacian module, it’s solely limited to local contrast manipulation. It is sometimes used to recover local contrast after it has been impacted by a global tonemapping operator, as opposed to implementing a local tonemapping operator that compresses dynamic range AND attempts to preserve local contrast at the same time.
(The original paper allows for a variety of edge-aware filtering/manipulation operations to be performed.)
Short answer: no.
Longer one: it is possible to tile-process an intermediate gaussian level that has a sufficiently small dimension, and then proceed from that one to compute the remaining levels.
While this is not an “hard” clip, the fact that the remapping is quantized in the [0,1] range means that the result is not reliable for input values above 1.
I do not quite agree, at least not with respect to the original idea. The papers from Paris and colleagues describe at least three applications of the local laplacians idea:
tone mapping, i.e. dynamic range compression (or expansion, but I do not consider this very interesting)
local contrast enhancement (which can be applied simultaneously to tone mapping)
style transfer
The first application only makes really sense if the input dynamic range exceeds the capabilities of the output device, a typical problem in scene->display mapping. Of course there is no strict requirement that the input must be scene-referred, but the idea works for scene->display mapping.
The log mapping has nothing to do with display space. The conversion from linear to log simply allows to express the threshold parameter in terms of luminance ratios instead of luminance differences, which gives a more uniform response across the whole dynamic range. It also involves some non-linear remapping of the regions that get compressed outside of the used-defined threshold (because multiplications in log space correspond to power functions in linear space…).
Yes, but it is clipped in the original image and there is nothing one can do about that. This specific image is a single RAW, not a merge of bracketed shots, so the dynamic range is relatively limited…
In fact, the tool also has controls for shadows/highlights compression, but the results seems to be not as pleasing as those I am getting by following the original paper:
After reading the articles about exposure fusion and local laplacian filters, and after experimenting a bit with both, I came to the conclusion that exposure fusion is a rudimental approximation of the dynamic range compression that local laplacian filters can achieve. Exposure fusion is OK when the input is represented by SOOC bracketed Jpegs that the user wants to quickly merge, but is a limited method when the input is represented by a floating-point RAW image that was obtained by merging several bracketed RAW shots.
Exposure fusion requires to generate a number of exposures from the original floating-point RAW, and then merges them according to a criteria that gives priority to well-exposed pixels. However, I do not see a good way to determine the optimal number of exposures, as well as their EV spacing.
On the other hand, the local laplacian method tries to make the best use of the information stored in the image, without any quantization (except for the speed-up method introduced by Aubry, which I still find more sophisticated that the brute-force bracketing approach of enfuse). All what I am saying is more dictated by my intuition that by rigorous mathematical arguments, so take it with a grain of salt…
To conclude, my suggestion would be to replace exposure fusion in DT’s base curve module by a rigorous implementation of Paris & Aubry local laplacians, applied to the image’s log-luminance as suggested by the authors. The output luminance can the be used to scale the input linear RGB values (even in camera colorspace), so that colors are preserved in the process… that is what I have been doing in the above examples.
The unpleasant clipping occurs way farther out in your processing than in the original image… for example when comparing it to my result from the original thread:
I would definitely disagree with this. While, from a purely theoretical perspective, tonemapping-via-fusion should be inferior, one must keep in mind a key observation:
The assertion made of “no halos” of the Paris/Hasinoff/Kautz paper is dependent on some atypical definition of “no” and “halo”. The authors even provide an example of haloing and reference it as a deficiency/failure case of the algorithm in figure 12 of the CACM version of the paper and figure 13 of the ToG version. For me, high-frequency haloing like that is far more visually disturbing than the low-frequency artifacts I’ve seen only once when pushing fusion to extremes.
If both algorithms failed in the exact same manner and in the same scenarios I’d agree with you, but they don’t. Both algorithms can fail (making them both “shit” and “pixel garbage” by one particular developer’s standards), and those failures occur in different scenarios with different failure modes.
Would I likely use this approach over fusion most of the time? Probably, after all I’ve found even the Fattal implementation in RT to meet 90%+ of my needs. Is it unambiguously superior in all situations? Definitely not.
In practice the local laplacian don’t seem that superior to the exposure fusion, but is not easy to tell with just a couple of examples, specially because the exposure fusion don’t do any local contrast.
Can you post this examples without the local contrast? (I guess alpha == 1?).
This is what I get with the exposure fusion:
I don’t understand why Fattal gets so little consideration here. I am no expert, but the algorithm as described in the original paper makes a lot of sense, and it really is halo free. One drawback that is often mentioned is that it is computationally expensive, but I’d claim that the implementation in RT is proof that it doesn’t have to be the case – or at least, that you can get it to perform decently. It’s interesting that the only thing Paris et al. have to say about Fattal in the paper is that it can be hard to implement and/or slow, but they never compare with it in terms of quality.
i also like exposure fusion. i think this idea: “develop the image for the dark areas and then for the bright areas and just brute force merge the results” is somehow good.
and there are quite some parallels between local laplacians and exposure fusion (full disclosure: i implemented both in dt). in a way the local laplacian also applies different curves to the same input image and then merges the results by selecting coefficients from the laplacian. only that usually the local laplacian uses more candidates than the std three we have for exposure fusion in dt.
@Entropy512: since you experimented so much with the fusion. did you try using more than the three input images? something like eight? if the intuition resulting from the similarity to the local laplacian holds, this should be able to reduce any remaining halos.
(and yes, sure, all of that is just display referred local contrast tweaking, not much to do with linear light and such)
interesting. you would prefer that over your idea with log curve + guided filter for contrast enhancement? did you discard that idea? if so, why? and by log curve you mean just log((uint16_t)x+1u) or so? no aces style transform that results in [0,1], right?
also, do you have your reference implementation available somewhere? i’d be interested in comparing my GPU version (implements the approximate fast version from the follow up paper) to your reference to see how bad the approximation is and how many buffers are needed etc.
@Edgardo_Hoszowski Fusion does operate in a way that attempts to preserve local contrast. However it does seem that it is resistant to overpreserving local contrast or amplifying it in such a way that things start to look strange.
I think Mertens did do a comparison, but my memory may be wrong. I’m not sure if I’m going to have time to reread the paper before I have to focus 100% on vacation packing. I obviously did not give it the consideration it deserved.
I did some initial work on trying to improve dt’s fusion implementation (as I found it never delivered results that I liked while feeding individual images to enfuse did), eventually @Edgardo_Hoszowski picked it up and ran with it and found the few remaining things I was missing, and split it into a separate module. (Having it in the same module as basecurve has been problematic because certain people are unable to have a rational conversation when basecurve is involved in any way, shape, or form…)
Eventually fusion as a whole was declared as “shit” and “pixel garbage” (to use the exact words of the person responsible for nixing the pull request) and it became clear that going forward, the current fusion implementation shall wither and die, existing solely for backwards compatibility.
Edgardo’s module supported up to 5 shifted images, and up to +3EV per shift. The only times I’ve seen the algorithm start to stumble are when the EV shift was greater than 2. I believe one potential scenario where it will almost surely fail is if the EV shift per image is so great that a pixel which starts on one side of the luminance weighting function in the initial image is shifted so much that it winds up on the other side AND farther away from the center in the next image.
Although I have once managed to start seeing artifacts at 5 exposures with only 1.5 EV per exposure - but that’s getting to be some extreme abuse.
An interesting approach that saves on memory is to, instead of generating a whole pile of images and fusing them together, to achieve higher dynamic range compression by iterating multiple image pairs, using the result of the previous iteration as the base for the next: HDR+ Pipeline - I’ve found Tim’s implementation to perform strangely in highlights, but I haven’t yet had time/motivation to dig in and figure out whether it was an implementation-specific issue or a fundamental algorithmic issue.
Now that you seem to be paying attention outside of your Vulkan project, I did have a few questions about the history of fusion in dt:
The original Mertens paper strongly implies that the algorithm was designed assuming input images with a nonlinear (sRGB specifically) transfer curve, given the definition of 0.5 as “optimum” - 0.5 is roughly middle grey in sRGB, but in linear it’s pretty far up in the highlights (under the assumption that 1.0 in linear space gets mapped to the output display’s maximum possible brightness). This caused dt’s fusion implementation to tend to crush the highlights whenever it was used.
Similarly, if you dig into enfuse (and this is easy to miss because their heavily templated code can sometimes be difficult to follow), there are comments indicating that the multiresolution blending needs to occur in a perceptually linear space. enfuse defaults to CIELUV, although I’ve never seen blending in CIELAB differ significantly in the final result, which is why the pull request from @Edgardo_Hoszowski blended in LAB space (since dt already had built-in support for converting into that colorspace) . So was dt performing blending in linear space an oversight? (blending in linear space was the number one contributor to halos, with capping nlevels at 8 being next)
Also, I see that in the commit history there were a lot of changes made to the exposure optimum and exposure width values - why wasn’t this treated as a red flag that maybe those values should be exposed to the user? I’m also curious why the method for combining saturation/contrast/exposure weights was so different than the original Mertens method or the method employed by enfuse (in the case of enfuse, I think their implementation based on linear combination may be broken, since unless I’m missing something ELSE, those linear weights will just get normalized out, since
A*x*B*y*C*z = (A*B*C)*(x*y*z)
And ABC would just get normalized out since it’s the same constant for every pixel in every image
Discard, not at all… however the guided filter tool is as sort of two-level approach, and therefore it is more difficult to prevent halos. Also I find that the pyramid-based approach preserves the local contrast in a way that looks less artificial, again probably because it works at multiple scales. The guided filter on the other hand is definitely faster, and I will definitely keep it as a “fast” solution.
It is even simpler. I am using a basic log2 function, that is (the code assumes linear sRGB data for the moment):
The advantage of the pure log encoding is that it transforms ratios into differences. If I set the threshold parameter of the local laplacian filter to 1, it means that anything above +/-1EV from the local average is considered as an edge and gets compressed. This is much different than saying “anything above 0.1” as you would do when working on linear data, because “0.1” is a huge value in the shadows, but a negligible one in the highlights, if you see what I mean…
The commonly used log(x+1) formula is not good in this case, because it breaks the relationship between “linear ratios” and “log differences”. Here is a simple example to show what I mean:
x_1 = 0.1, x_2 = 0.2, log_2(x_2) - log_2(x_1) = 1
x_1 = 0.01, x_2 = 0.02, log_2(x_2) - log_2(x_1) = 1
(that is, the two pixels are in both cases 1EV apart, and the log2 difference is the same), while
x_1 = 0.1, x_2 = 0.2, log_2(x_2+1) - log_2(x_1+1) = 0.125
x_1 = 0.01, x_2 = 0.02, log_2(x_2+1) - log_2(x_1+1) = 0.014
that is, the difference of logs is almost a factor of 10 smaller in the second case…
Let me clean the code a bit, and then I will make a gist and share it here. I would also be very much interested in comparing with your GPU implementation, and possibly port my CPU code to the GPU!
eerrhmm… yes, what you said. i mean, let me read the fusion paper again and try to remember these implementation things. quite possible that it was just convenient at the time how it fit the rest of the pipeline. just want to reply here quickly to say that i didn’t just ignore your question.
seems to make sense intuitively that the laplacians should be in a perceptual space, if anything principled can be said about them then it’s probably the response of some aggregate receptor in the eye that looks similar to the filter…
Out of curiosity, is it just coincidence that you happened to implement fusion in dt around the same time Google published their paper on HDR+, or did that happen to inspire you to use the Mertens algorithm for tonemapping a single input by feeding it synthetically exposure-shifted images, as opposed to fusing out-of-the-camera JPEGs? ( http://hdrplusdata.org/ )
Edit: An observation - Samuel Hasinoff was one of Paris’ co-authors of the 2011 Local Laplacian paper, and in 2016, a co-author of a paper where, despite Paris’ algorithm obviously being known to the team since Hasinoff was a co-author, Mertens exposure fusion was selected.
As far as fitting in the pipeline - the solution @Edgardo_Hoszowski used was to leverage dt’s built-in colorspace transforms to move data into LAB space for blending and back out at the end, so from a pipeline perspective the module was linear RGB in, linear RGB out. I did some experimentation with log2f() with decent results before I pretty much burned out on the whole thing.
As to @Carmelo_DrRaw bringing up preference for log2f() over log1p() - I fully agree on this one. When I saw enfuse using log1p() that one definitely made me scratch my head a little. I assume the intent was “negative numbers are bad” - but negative numbers in log space are not inherently bad…
As an FYI I’ll be leaving on vacation tomorrow and will be attending a local festival tonight, so I won’t be able to respond with too much defail in the next week or so.
The main difference with respect to the old code is that the new code offers the possibility to linearly adjust the amount of local contrast enhancement between the smallest and largest scale. In other words, it is possible the enhance the coarse scale much more than the finer, giving emphasis to the “structure” of the image more than the “texture”.
Ping @Entropy512@hanatos as they are likely interested in those developments.
I can give more details on the implementation and the specificities of the algorithm, if anyone is interested. I will also try to provide pre-compiled packages to allow people to easily play with the code…
Meanwhile, If you have test images to suggest I’ll be glad to post examples of what can be achieved.
The local laplacian pyramid is applied, like in my previous version of the code, to log encoded values, as this produces results that are more “perceptually” uniform. There is however another advantage: the result is independent of the overall brightness of the image. In other words, if the image is too dark or too bright, you can apply an exposure compensation either to the input or the output of the dynamic range compression, and the result will be the same. Mathematically, this comes from the fact that logarithms transform ratios into differences.
In practical terms, this means that the result of the dynamic range compression is suitable for further processing, including exposure adjustments and film-like tone mapping.
Disclaimer: I don’t know very well local laplacian filters, though I have seen some videos explaining the principle.
Your results look very good.
Would it be possible to mix the use of linear and log encoding so that any necessary average/blurring is done on linear-encoded data, instead of performing the whole filter on log-encoded data?
I mean:
construct all the laplacian pyramid from linearly encoded data
but apply the various S curves on log encoded data
I believe his approach is to make the math simpler and the code more efficient. Perhaps we could reverse the log and enter into linear space for those operations.
This is certainly possible, but computationally more expensive. Instead of computing the log once for the initial image, one would have to do the lin → log → lin roundtrip math for every pixel that gets remapped.
I will try to see if this brings visible differences. Another thing Id like to try is to work on linear data, but apply a threshold on the ratio instead of difference in pixel values, to mimic what the log math is currently doing…