Iâve seen papers which actively âgameâ the PSNR etc., because it became the goal. It would be nice to have something better than arbitrary for this though.
Yes they do.
PSNR and SSIM are bad for the mentioned reasons but they are free of perceptual concepts. Qualitative artifact screams âperceptual metricâ to me.
This can be discussed. Obviously there is no perfect answer, just good enough.
If the algo is reconstructing data lying outside of the spectral locus, that data has to come back to display values anyway through a CAT or gamut compression or whatever, just as the trichormatic sensor values it uses to reconstruct stuff has to.
Being aware of the biases is the first step into the right direction. Since this is pure theoretical spitballing, I would say all of them: PSNR, SSIM and some flavour of \Delta E, throw in some Retinex for good measure if thereâs a workable implementation. Then once the metrics are in, those can either be weighted against obvious outliers or chosen to prefer algorithms depending on the situation (the data to be reconstructed). For example: for texture reconstruction use Algo1, for gradient reconstruction use Algo2 because these metrics show that one is better at this, and two is better at this.
Not having a metric at all seems like itâs not the best idea.
When I say âperceptualâ, I mean âperceptually scaledâ. Artifacts only mean you have a working pair of eyes and the image displays ringing, edge duplication, gradients reversal, cartooned objects, and so on, that can be linked to high PSNR as long as they minimize the error norm in average.
Go for it, try it. Iâll be there when you come back saying it wasnât quite as simple as you initially thought.
All these metrics aim at actually predicting an average observer perceived similarity. None of them do it properly. Choosing or weighting them is simply hiding your subjectivity behind bullshit numbers. Might as well pull a ruler and measure our dicks, that will save CPU cycles and electricity.
Finally some metric that makes sense⊠Thanks !
I wasnât claiming that sensor values to display values is a solved problem! I wanted to express that reconstructed data is going to go through the same pipeline.
Not choosing any metric cannot be what you suggest to judge the discrepancy between a ground truth and the reconstruction quality.
Thatâs why I think @Iain 's dataset has such a significance, the presence of image-pairs to compare reconstruction quality to.
I havenât checked yet, but are there any examples taken with a lens that displays purple fringing?
Iâve found that can throw off inpainting algorithms.
@PhotoPhysicsGuy Some of clipped/unclipped pairs are from handheld shots and will not be able to be used for precise evaluation.
If a ground truth is required for testing algorithms , then perhaps the best thing to do is take an unclipped image and create a clipped version by adjusting its exposure digitally.
Regarding metrics for highlight reconstruction, I tend to go with âif it looks right, it is rightâ. I think one goal of highlight reconstruction is simply to stop the clipped areas from distracting the viewer from the content of the image. Completely flat areas are unnatural and stick out.
@CarVac there is no significant purple fringing
It seems to me that ground truths are, as of yet, not required. At the same time there is a lot of appeal to testing algorithm-quality against ground truths. Obviously optimizing for random-metric-no.23 means not much. But I would argue that not having a metric is equally too much guessing and opinion.
I agree that manually clipping channels on demand might involve less work.
Personally: Not a fan. It can be a rabbithole. The human visual system is SO complex, and SO sensitive to context, while this might work in X% of cases, it can fail ungracefully for the rest.
EDIT context sensitivity: HVS Illusion on Twitter
Perhaps, âIf it looks right, then it is rightâ is over simplified, but I think everything comes back to what someone thinks when they look at an image.
It seems to me that a good metric for highlight reconstruction is the one which correlates to âit looks rightâ most of the time. Having a quantifiable metric just means that you donât have to do complicated tests to find out what âlooks rightâ most of the time to most of the people.
I sure can provide pixelshift images when I figure out what they should show
I skimmed the paper yesterday. Nothing I havenât already been doing. The challenge is determining the appropriate method for each step and which steps would yield a robust evaluator. E.g., unlike them, I have not been using perceptual spaceâŠ
I would also argue that, depending on where you put the highlights reconstruction in your pipeline, playing nice with the upcoming demosaicing, regardless of actual standalone performance, is already a criterion of performance in itself. See: chromatic aberrations.
But then, you evaluate the perf of the reconstruction + demosaicing.
So, yeah. As @Iain said, and even if that goes against my usual principles, I donât see any way of assessing objective quality in this ill-posed problem but to go with what looks less shitty. If it was denoising or deblurring, I would have a different view, but this literally trying to guess the missing content.
Being practical is important.
I am happy to dedicate Colourful night shot with reflections to the public domain. How could I best do that?
@ilia3101 I have added it to the collection as âNight Reflections.CR2â
https://drive.google.com/drive/folders/1SmiQ7E01RaflZxIFpfi5FMCeZGpHirj-?usp=sharing
I also suggest changing the licence on the play raw thread from creative commons to public domain.
Looks like I donât have an edit button anymore? (on that thread)
Could that be changed?
Oh, I remember now, the forum software only allows edits for a short time.
Oh, well.
Yes after a certain period of time posts are locked from editing. This preserves the conversation.