Highlight recovery test set

I’ve seen papers which actively “game” the PSNR etc., because it became the goal. It would be nice to have something better than arbitrary for this though.

1 Like

Yes they do.

PSNR and SSIM are bad for the mentioned reasons but they are free of perceptual concepts. Qualitative artifact screams “perceptual metric” to me.

This can be discussed. Obviously there is no perfect answer, just good enough.

If the algo is reconstructing data lying outside of the spectral locus, that data has to come back to display values anyway through a CAT or gamut compression or whatever, just as the trichormatic sensor values it uses to reconstruct stuff has to.

Being aware of the biases is the first step into the right direction. Since this is pure theoretical spitballing, I would say all of them: PSNR, SSIM and some flavour of \Delta E, throw in some Retinex for good measure if there’s a workable implementation. Then once the metrics are in, those can either be weighted against obvious outliers or chosen to prefer algorithms depending on the situation (the data to be reconstructed). For example: for texture reconstruction use Algo1, for gradient reconstruction use Algo2 because these metrics show that one is better at this, and two is better at this.

Not having a metric at all seems like it’s not the best idea.

ꟻLIP: A Difference Evaluator for Alternating Images | Research ?

1 Like

When I say “perceptual”, I mean “perceptually scaled”. Artifacts only mean you have a working pair of eyes and the image displays ringing, edge duplication, gradients reversal, cartooned objects, and so on, that can be linked to high PSNR as long as they minimize the error norm in average.

Go for it, try it. I’ll be there when you come back saying it wasn’t quite as simple as you initially thought.

All these metrics aim at actually predicting an average observer perceived similarity. None of them do it properly. Choosing or weighting them is simply hiding your subjectivity behind bullshit numbers. Might as well pull a ruler and measure our dicks, that will save CPU cycles and electricity.

Finally some metric that makes sense… Thanks !

1 Like

Thanks @Iain for the test images and @hanatos for yet another evaluator ꟻor me to play with. :slight_smile:

1 Like

I wasn’t claiming that sensor values to display values is a solved problem! I wanted to express that reconstructed data is going to go through the same pipeline.

Not choosing any metric cannot be what you suggest to judge the discrepancy between a ground truth and the reconstruction quality.
That’s why I think @Iain 's dataset has such a significance, the presence of image-pairs to compare reconstruction quality to.

I haven’t checked yet, but are there any examples taken with a lens that displays purple fringing?

I’ve found that can throw off inpainting algorithms.

@PhotoPhysicsGuy Some of clipped/unclipped pairs are from handheld shots and will not be able to be used for precise evaluation.

If a ground truth is required for testing algorithms , then perhaps the best thing to do is take an unclipped image and create a clipped version by adjusting its exposure digitally.

Regarding metrics for highlight reconstruction, I tend to go with “if it looks right, it is right”. I think one goal of highlight reconstruction is simply to stop the clipped areas from distracting the viewer from the content of the image. Completely flat areas are unnatural and stick out.

@CarVac there is no significant purple fringing

1 Like

It seems to me that ground truths are, as of yet, not required. At the same time there is a lot of appeal to testing algorithm-quality against ground truths. Obviously optimizing for random-metric-no.23 means not much. But I would argue that not having a metric is equally too much guessing and opinion.

I agree that manually clipping channels on demand might involve less work.

Personally: Not a fan. It can be a rabbithole. The human visual system is SO complex, and SO sensitive to context, while this might work in X% of cases, it can fail ungracefully for the rest.

EDIT context sensitivity: HVS Illusion on Twitter

Perhaps, ‘If it looks right, then it is right’ is over simplified, but I think everything comes back to what someone thinks when they look at an image.

It seems to me that a good metric for highlight reconstruction is the one which correlates to ‘it looks right’ most of the time. Having a quantifiable metric just means that you don’t have to do complicated tests to find out what ‘looks right’ most of the time to most of the people.

3 Likes

I sure can provide pixelshift images when I figure out what they should show :sweat_smile:

I have added an image that has significant chromatic problems at the edge of clipped regions.

2 Likes

I skimmed the paper yesterday. Nothing I haven’t already been doing. The challenge is determining the appropriate method for each step and which steps would yield a robust evaluator. E.g., unlike them, I have not been using perceptual space…

I would also argue that, depending on where you put the highlights reconstruction in your pipeline, playing nice with the upcoming demosaicing, regardless of actual standalone performance, is already a criterion of performance in itself. See: chromatic aberrations.

But then, you evaluate the perf of the reconstruction + demosaicing.

So, yeah. As @Iain said, and even if that goes against my usual principles, I don’t see any way of assessing objective quality in this ill-posed problem but to go with what looks less shitty. If it was denoising or deblurring, I would have a different view, but this literally trying to guess the missing content.

2 Likes

Being practical is important. :wink:

1 Like

I am happy to dedicate Colourful night shot with reflections to the public domain. How could I best do that?

@ilia3101 I have added it to the collection as ‘Night Reflections.CR2’

https://drive.google.com/drive/folders/1SmiQ7E01RaflZxIFpfi5FMCeZGpHirj-?usp=sharing

I also suggest changing the licence on the play raw thread from creative commons to public domain.

1 Like

Looks like I don’t have an edit button anymore? (on that thread)

Could that be changed?

Oh, I remember now, the forum software only allows edits for a short time.

Oh, well.

Yes after a certain period of time posts are locked from editing. This preserves the conversation.