Human perception

kofa · February 11, 2024, 12:05pm

This is a topic to continue a discussion that started out with ART and Sigmoid, intending to provide background not related to any specific software. I’m opening it at the suggestion of @micha.

I’m not an expert on any of this. Please, correct me when I’m wrong.

We’ll continuing from ART and Sigmoid - #62 by kofa, which I’m copying below.

You probably already know the term mid-grey, or midtones: stuff that we perceive as neither ‘dark’ nor ‘bright’: the sky, away from the Sun; grass; many wooden surfaces. When Ansel Adams developed his ‘zone system’, he centred it around mid-grey (German Wikipedia: Neutralgrau – Wikipedia ), which is defined as a surface that reflects 18% of the light. Here 100% would be defined as a surface that reflects all the light in a diffuse way, like white paper or a white wall; not like a mirror. However, in the world that we photograph, there are areas that are brighter than 100% diffuse reflections: light sources, or surfaces such as metal, water etc. that are almost like mirrors (called ‘specular’ reflections (German Wikipedia link: Reflexion (Physik) – Wikipedia)).
While our displays clearly emit light, so they are light sources, they cannot be as bright as the Sun, or the surface of the filament in a traditional light bulb. When printing, the situation is even worse, as paper is clearly a reflective surface, it cannot be brighter than 100%, by definition.

If we tried to represent light in a linear way, without compression, we may end up with something like this. Parts that are brighter than the display can handle are ‘burnt out’.

What we want is to be able to create a picture between 0% and 100% brightness that somehow resembles reality, even though in reality the contrast between the darkest and brightest parts of a scene is much larger than what paper or a display can produce.

If one were to simply scale the values equally (multiply pixel values by a number < 1), making sure that the brightest part of an image becomes 100% white on the display or the paper, the scene would be too dark. Here’s a screenshot using ART:

We could then add a curve on top to bring up the shadows. That is the traditional ‘base curve’/‘camera curve’/‘tone curve’ approach. A very bad attempt:

Luckily, our senses (sight as well as hearing) are (as far as I know, and probably only approximately) logarithmic in nature. That means, when the signal (light or volume) is multiplied by an amount (for example, doubles), we sense it increased by some amount (we perceive addition), not multiplication (for example, doubling). When the signal doubles again, we again perceive an increase by the same amount. So, it went from ‘10’ (whatever the unit is) to ‘20’, then from ‘20’ to ‘40’ (so it increased first by ‘10 units’, then by ‘20 units’), we feel it changed ‘x’ both times.

For sound volume, you are probably familiar with dB. That is also a logarithmic scale; the threshold of our hearing is defined as the base line 0 dB; a quiet room is 30 dB, a normal conversation is about 60 dB, a hairdrier 90 dB, a rock concert 120 or above. Even though physically the power levels moving the air are not 1:4 in ratio, we feel that if we take the difference between the threshold of human hearing to a quiet room, we then go again ‘as much’ from the quiet room to 60 dB (about the loudness of a normal conversation), then again to 90 dB (the sound level of a blender or hair drier), and once more to 120 dB, the rock concert.

Here is a plot of the ‘natural’ logarithm function:

At x = 10, the value is 2.3; at 20, about 3, so it increased by 0.7. At x = 40, the curve reads about 3.7; again, it increased by 0.7; at 80, it’s at 4.4, so it went up by 0.7 again.

With log tone mapping, you get this gentle curve that becomes less and less steep. There is more maths involved, but that is not so important.

Instead of darkening the image as shown above, we can apply the log tone mapping to keep the shadows visible, and map those bright parts (the sky, which is a light source) into the displayable/printable range:

However, you can also see that it also means you lose contrast. There is only a small portion where contrast is close to 1 (the black line is contrast = 1, the blue is the log):
.

One solution is to use a traditional S-curve, like the one I gave above, to place contrast wherever you need it. The other is a parametric curve, like Sigmoid. (filmic is another such curve, originating in the animation software Blender, I think, and also used in darktable in a modified form.)

If you open the curve explorer (also posted above), and only plot sigmoid (log-logistic), using contrast you can control how steep the straight part of the curve is. Contrast 1 vs 2:

With skew, you can control how much the shadows and the highlights are compressed. Notice when the line ‘leaves the ground’ and ‘hits the ceiling’. Skew: -2 vs 1:

(Edit: contrast is contrast (slope of a tangent, a straight line touching but not crossing the curve) at 0 EV; the slope at other points depends on the base contrast one sets, but also on the skew.)

How is that different from a traditional S-curve? The tricky part is that the y axis is in screen/paper brightness percent, but the x, the input, is in EV (so, at the bottom, you don’t see 1%, 2%, 4%, 8%, 16%, 32%, 64%, 128%, 256% and so on; 0 EV is mid-grey, the others are the given number of EV above or below it, so for each step the amount of light doubles or halves). That is where you can see the logarithm at play: a difference of 1 EV is doubling the amount of light, which we perceive as ‘it got brighter by the same amount each time’.

I do not know if this helped or confused you even more. Please let me know.

Michael asked me to provide more info on the logarithmic nature of our senses.

When sound volume (the acoustic power coming from the speakers - SPL: ‘sound pressure level’) doubles, we don’t feel ‘hey, it’s twice as loud’; we feel, ‘it got a bit louder’ (I have read somewhere that when people are asked to turn the volume up or down ‘by a notch’, they instinctively double or halve the power). In an article if found online (Decibel Levels and Perceived Volume Change – Conrad Askland), one can read:

a 3dB increase in the signal means twice the power (electric power turned into sound)
a 6 dB increase is 4 times the power
a 10 dB increase is 10 times the power.

Yet, what we hear is (quoting the article):

6dB SPL increase is perceived as an approx. 50% increase in volume by a sample group.

10dB SPL increase is perceived as an approx. 100% increase in volume by a sample group.

(9 dB would be doubling again compared to the 6 dB, so 8 times the original power, and 12 dB would be 16 times the original.)

So, the music got 10 times louder, and we feel it got twice as loud.

(Answering a question about what I mentioned in the original thread about the threshold of human hearing being considered 0 dB, and a quiet room being 30 dB):
A ‘quiet room’ is not silent. Even with the windows closed, there is always a bit of sound (the wind outside, your own breathing, maybe a fridge running in the kitchen, a clock ticking). That is why places studying human hearing have specially insulated chambers. It is like ‘night’ (with starlight and perhaps a bit of light pollution from a nearby town) vs being in a completely darkened room / in the cellar / in a cave without any lights on.

(BTW, 30 dB means 1000 times louder(!), in terms of physical power, SPL.)

If you go into a dark room, and turn on a single light bulb, you’ll see ‘oh, it got a lot brighter’. But when you turn on a second, identical light bulb, obviously doubling the power, you will not feel ‘it is twice as bright now’. Adding a third bulb will be noticeable, but you don’t feel you added just as much light by turning on the 3rd bulb as when you turned on the 2nd and especially the first. It’s not about the absolute amounts, it’s about the ratios: turning on the first bulb increased the amount of light from almost nothing to whatever a single bulb produces, maybe a hundred times as bright. The second bulb ‘only’ doubled the light; the 3rd bulb only increased it by an additional 50%. Adding a 4th bulb would not feel 4 times as bright as the light of the 1st bulb, and would not feel twice as bright as the first and second bulb.

This phenomenon is not restricted to vision and sight. See:

Weber–Fechner law - Wikipedia (English)
Weber-Fechner-Gesetz – Wikipedia (German)

micha · February 11, 2024, 2:22pm

Why is that? We have already seen that an increase of 3 dB means a doubling of the electric power.

kofa · February 11, 2024, 3:04pm

Sorry, missed a 0. I meant 30 dB. Now fixed.

micha · February 13, 2024, 11:12am

Hello István,
yes, it is well known that the average grey of the grey card and the light meter of every camera is calibrated to exactly 18% reflectance.

But why is the value 18 and not 50?

Does it have something to do with the fact that the middle grey is exactly 2.3 f-stops below the brightest white with pattern?

Would 50% then be exactly one f-stop below white?

25% two f-stops and 2.3 f-stops then 18%?

grubernd · February 13, 2024, 12:50pm

“it’s complicated” …

==> Middle gray - Wikipedia

Short version: the 18% was a “random” pick out of all the different possible numbers.

It serves it’s purpose as a reference point.

As the initial posts shows rather well, the usefulness of setting exposure only to a gray card is rather questionable: even if your scene has less dynamic range than the recording media you want to expose for the dynamic range and not for (random pick) middle gray.

priort · February 13, 2024, 4:21pm

Not sure if this is true but this reference asserts that camera metering actually uses ~ 12.9%

priort · February 13, 2024, 4:41pm

There was a good quote in a recent presentation… The eye - brain combination is not a camera…

When you capture an image with the camera things need to be considered to account for this perceptual aspect.

So some number of reflected light will be perceived as 50% to our brains…

This common test image of the effect of surrounding light on that perception shows the “eye brain combination” in action when presented with the same gray patch…

kofa · February 13, 2024, 6:51pm

A 1 EV difference is doubling the amount of light. To find how many EVs there are between two light amounts, one has to divide them and take the 2-base logarithm (log_2). A logarithm tells you ‘how many orders of magnitude’ there are between two amounts; the base of the logarithm is the ratio (2, or 10, or whatever) you take as ‘an order of magnitude’. The 2-base logarithm of the ratio between 100% and 18%, or log_2(1/0.18), is about 2.47; so, there is about 2.47 EV between 18% and 100%.

Off-topic (audio, not photography):
The 10-base logarithm of 1000, log_{10}(1000) = 3, because 1000 = 10 * 10 * 10 = 10^3. A “bel” of difference between two amounts of power means that one is 10 times as much as the other. Since that is too coarse a step, we use the tenth of a bel, a decibel (dB), which you get by multiplying the log_{10} value by 10. Hence, 30 dB = 3 B = a 1000-fold difference. This is what we talked about with the threshold of human hearing and a quiet room. So, a 90 dB blender or hair drier is 90 dB - 30 dB = 60 dB louder than the quiet room, which means 6 bels, or 6 orders of magnitude, if ‘one magnitude of difference means 10 times the power’ – such a common device is a whopping 1 million times louder than a quiet room!

micha · February 13, 2024, 7:16pm

Hello István,
Oh, I would like to understand that in more detail now. Previously we heard that this 18% was more or less a random value. But with your calculation, it really seems to make sense.
Could you please describe again how you arrived at your 2.47 EV. I remember from my analog photography days that you have to expose about 2.3 or even 2.5 EV darker than the brightest white, which should still have some detail, so that the slides are perfectly exposed.

I’m asking because I’m not familiar with log arithmetic.

micha · February 13, 2024, 7:19pm

Hello Todd,
oh, I would like to understand that in more detail now. Previously we heard that this 18% was more or less a random value. But with your calculation, it really seems to make sense.
Could you please describe again how you arrived at your 2.47 EV. I remember from my analog photography days that you have to expose about 2.3 or even 2.5 EV darker than the brightest white, which should still have some detail, so that the slides are perfectly exposed.

kofa · February 13, 2024, 8:48pm

This is how:
log_2(1/0.18) = 2.47

A 1 EV difference means a ratio of 2 : 1. log_2(2) = 1, because 2^1 = 2.
A 2 EV difference means a ratio of 4 : 1. log_2(4) = 2, because 2^2 = 4.
A ratio of 1 : 0.18 (100% : 18%) is about 5.56, or 5.56 : 1. log_2(5.56) = 2.47, because 2^{2.47} = 5.56. So it’s 2.47 EV.
A 3 EV difference means a ratio of 8 : 1. log_2(8) = 3, because 2^3 = 8.

priort · February 13, 2024, 9:14pm

It has a lot to do about how our neuron’s encode information…

A very simplistic summary would be something like this…

“Logarithmic” or non-linear perception of light

Weber’s Law, Fechner’s Law and Steven’s Law

These laws are fundamental in the field of psychophysics, which studies the relationship between physical stimuli and our perceptual experience.

Weber’s Law states that the minimum increase in stimulus required to produce a perceptible increase in sensation is proportional to the pre-existing stimulus.

Fechner’s Law, an inference from Weber’s law, goes further. It states that the subjective sensation (what we perceive) is proportional to the logarithm of the stimulus intensity.

In simpler terms, Fechner’s law suggests that our perception of brightness, loudness, and other sensory experiences follows a logarithmic scale rather than a linear one.

Another model, Steven’s law models this relationship on a power function.

Logarithmic Perception of Light

When it comes to vision, these laws play a crucial role.

When observing a light source the perceived brightness of that light is not directly proportional to its actual intensity (as it is measured with a non-human instrument, ie camera or light meter), but instead, our perception follows a logarithmic relationship. As the energy (intensity) of the light increases, our sensation of brightness increases less rapidly than the actual energy does.

Why Logarithmic?:

Neurons often use logarithmic coding schemes. This means that their responses are more sensitive to relative changes (ratios) rather than absolute differences.

Logarithmic perception allows us to detect small changes across a wide range of intensities. [It’s like having a built-in adaptive mechanism that helps us perceive both faint and intense stimuli effectively.

This is a pretty simplistic view. One quite technical overview could be found here:

(Eye intensity response, contrast sensitivity)

I

grubernd · February 13, 2024, 9:39pm

We have a little egg vs chicken situation here.¹

Those exposure compensation numbers are based on the premise that the metering device is calibrated to those 18%. In reality all components play together and influence each other. For example if you would have used a developer that would overly develop the white areas, you might have to dial the compensation way down. The same applies when photographing with digital. Even more so, as digital will blow out and have no recoverable information beyond a certain point.

It is a nice practice to “calibrate” your camera for spot metering:

You meter off a large enough white area, photograph a series with changed exposure compensations and then look at the files in your raw editor with your preferred settings and note at which camera setting the white keeps too little information for your editing style.

Now remember that number and you can get spot on – pun intended – exposures.

For example: with my Nikon D500 and how I use darktable I have established a compensation of +2.7EV for the spotmeter onto the brightest area I want to have rendered with detail but very close to white.

Circling back to the main topic: this does not tell anything what that brightest part really is or how it should be perceived later by the viewer.

¹) in evolution the egg is always first.

grubernd · February 13, 2024, 9:43pm

Might be very true - I have only calibrated closed loop systems and never the exposure meter itself.

If you think about it: a digital camera could be set to 99%, pick the brightest area from the sensor and expose to the right every single time, perfectly.

micha · February 13, 2024, 10:47pm

I had to deal with log 49 years ago, but never again since then.

I’m concerned here with photography and the question of how to get the high contrasts of the subject and the camera sensor onto the screen and paper without everything becoming muddy and far too dull.

This question is important. So the question of which curve or other tricks are useful to achieve a good result.

Maybe others see it differently, but for me it’s not so important to understand the basics with the math.

I am now also prepared to not understand the log.

If you want, we can move on to the practice of curves. Please note that at least I don’t want to go deeper into computer science.

The exciting question is what I can do to treat the compressed contrast in such a way that at least the parts that are important to the image still retain sufficient contrast and brilliance.

snibgo · February 14, 2024, 12:12am

A couple of points about contrast:

When editing in the display-referred phase, we might ensure that at least one pixel is black and at least one is white. This is a simple contrast control.
After that, we might want to increase contrast. This can be either be global or local (or both).
Global contrast is changed with a tone curve. This will increase contrast for a certain range of tones, while decreasing contrast in other ranges. For example, a photo might have a bell-shape histogram, and we want to increase contrast in mid-tones, so we apply an S-shape curve which does that while decreasing contrast in shadows and highlights. The curve is steep in the middle and flatter at both ends.

Or a landscape might have a bimodal histogram, with mostly dark pixels (in the land) and light pixels (in the sky), with few mid-tone pixels. So our curve might be a double-S-curve, with steep portions in the dark land and light sky, and flatter portions elsewhere.

If we apply a curve at the scene-referred phase, we can apparently increase contrast in one tone range without reducing contrast elsewhere, because we don’t need to worry about clipping black or white. However, the final displayed image will need compression somehow. So when we want to increase contrast in some tones, we always need to consider also where we will decrease contrast.

Local contrast might be done with different curves applied in different parts of the image. This might be controlled by hand-built masks, or automatic (segmentation) masks. Or it could be done with digital algorithms such as bilateral filter or guided filter, which each can blur (or sharpen) areas but not edges.

kofa · February 14, 2024, 6:19am

You specifically asked that a new thread be created to talk about background, and not about curves. I think the questions about curves should be discussed in the original topic.
It is also possible to talk about curves in general, but then it’s going to be about maths, after all.

jdc · February 14, 2024, 7:41am

I tried to take these concepts into account by using Ciecam (Color Appearance Model), since 2012…and now with Cam16.
With my interpretation :“Trying to take into account by software, the physiological aspects due to the perception of the eye and the brain”.

Some of the effects taken into account:
https://rawpedia.rawtherapee.com/CIECAM02#About_CIECAM02
Simultaneous contrast, Hunt’s effect, Steven’s effect, Helmholtz-Kohlrausch’s effect, Chromatic adaptation, etc.

An example taken from the list:

You can get an overview of what Ciecam is with the tutorial (which should probably be updated a little)
https://rawpedia.rawtherapee.com/CIECAM02

I recently made several tutorials on the subject (directly or indirectly)
https://discuss.pixls.us/t/paris-olympic-games-pont-alexandre-iii-tutorial/41895
https://discuss.pixls.us/t/another-tutorial-color-appearance-truck-under-a-tunnel/41947
https://discuss.pixls.us/t/scene-reffered-and-display-reffered-part-1/42076
https://discuss.pixls.us/t/scene-reffered-and-display-reffered-part-2/42087
The 3rd part, the summary will soon be available.

Concretely there are 2 modules present in Rawtherapee:

Color Appearance & Lighting (Advanced tab) - complete module which is complex for a beginner
Color Appearance (Cam16 & JzCzHz) (Local Tab) - in Basic mode, this module is quite simple and allows you to address (and solve) general colorimetry problems, including for high dynamic images (25 Ev)

Jacques

Tamas_Papp · February 14, 2024, 11:19am

I am not sure that thinking about it this way is helpful. At the end of the day it is all about light hitting the sensor or the retina, and whether it comes from a light source or a reflecting surface is of secondary importance, once we know the intensity.

Similarly, when it comes to specific curves, such as filmic or sigmoid, I think about practical considerations instead of trying to derive them from first principles about human perception. The photo industry ended up with a lot of very useful ways to map light into paper by chemical film, but let’s not forget that this took more than a century of experimentation.

For digital signals, we can be more flexible with post-processing, but practically most people want something that minimizes the work so they can focus on other aspects of the image and/or postprocess their photos quickly. Sigmoid may just be a sweet spot.

Finally, I think that we the retina and the brain is doing something that is not unlike local contrast enhancement (I am not an expert and maybe someone can link references). For digital photography it is fine to decouple the global mapping from this, but let’s not forget that film did this too and it can be emulated digitally.

EspE1 · February 14, 2024, 3:11pm

When our theme now is “perception” I believe its pertinent to point to that our understanding of the role of perception lately has changed somewhat within our current understanding of how the brain functions at large.

For more than hundred years our main concept of the brain has been like a reactive information processor: We receive some sensory stimulus, which our brain then subsequently processes, and from this the brain command new acts.

For about the last decade, however, a new picture is emerging and beginning to take foothold in psychological research: That of “the predictive brain”. Related to this there is now also much more focus on the fact that for all the nerve connections that goes from our sensory neurons to the brain, there is about the same number of connections going back from the brain to our sensors. This latter fact has largely been ignored for most of the era of psychological research.

What does this mean?

Proponents of “the predictive brain” argue that the brain’s overall role is to aid our body in performing a next act in a way that likely should increase our survival. To achieve this, it’s too slow just to react on incoming sensory signals. Rather the brain constantly constructs images / spatial models of our whereabouts (also in abstract ways like our social relations), and from that it predicts what is likely the best next step. Our perception then serves to validate our predictions, and if it doesn’t, a revision of the brain model is induced. (It’s also argued that this mode of brain function is a more cost effective way for the brain to work, which of course is an evolutionary important aspect. Less eating necessary - our brain currently consumes about 20 % of our food intake.)

It also probably mean that our perception of our sensory stimuli are likely to be modified by our understanding of our surroundings and by our mental states more than we previously believed. It has e.g. been demonstrated that even such a fundamental “mechanical” function as regulating the aperture of the eye’s iris, is not determined alone by the intensity of the light reaching the eye, but that the size of the opening is influenced by what we believe is the intensity of the light.

Much of our visual response testing – and much of psychological laboratory research in general – is made within constructed research test environments where we expose participants to one or a few stimulus and focus on this in isolation. But this raises serious questions of the ecological validity of any knowledge we think gained from such tests. Because our brains and our senses normally work within a much richer environment where input from one sense is combined with others, with our previous experiences, our understanding of current context, our moods/emotions etc, all mixed together. In such a setting e.g. our perception of light strength is influenced e.g. by our perception/believes about distance to the light source – and our perception of distance is dependent on several physical aspects, not the least by moving head/body within those surroundings, and on and on an on … it’s all connected. Furthermore, attention is in general an important factor that influence our judgement of the magnitude of sensory input.

Take a look at CIE’s definition of “color, perceptual”|“perceived color”: “characteristic of visual perception that can be described by attributes of hue, brightness (or lightness) and colourfulness (or saturation or chroma)”,
and then at Note 1: “Perceived colour depends on the spectral distribution of the colour stimulus, on the size, shape, structure and surround of the stimulus area, on the state of adaptation of the observer’s visual system, and on the observer’s experience of the prevailing and similar situations of observation.”

We now know that all the three color dimensions, hue, saturation, and brightness, of color stimuli, as well as their interactions, have various effects on the emotional state of the observer. Since we know emotional state have effects on many perceptual and other cognitive aspects of our nervous system, we should likely include “mental state” into that Note 1. But as Freud once noted, (at the time he still was trying to establish a scientific psychology before he gave up due to the then lack of instruments): “Es ist nicht bequem, Gefühle wissenschaftlich zu bearbeiten.” - and for many reasons it isn’t much easier today to make research were the effect of an extensive set of emotions should be taken into account, (including because after more than 150 years of scientific psychological research there is still no agreement what an emotion is or what types of emotions there are … ).

We never perceive a pixel alone, but as part of the total image with all its variation in lightness and color, surrounded by its environment, including ambient light, and as part of our total mental state.

I see the need for getting as good an understanding of our perceptual responses as possible, and the need to create image processors with e.g. orthogonal controls with well-behaved response curves that can lead to us having tools being as accurate as possible for e.g. reproducing other images, (but we need to take into account that the perceived colors of a painting is also the result of the physical aspects of paint and light from various angles falling on it …), or doing product or portrait photography and the like.

For the rest of us who try to “create nice pictures”, we might find some relief in the fact that in seeing an image our eyes just make a few point readings and fill inn the rest as fits, something painters (and magicians) have known to “trick” us - we are deceived in the feeling of really seeing a wide picture – and from the fact that I’ve never heard any member of a photo jury declare that if it wasn’t for some deviating nuance in hue or chroma, a picture would have won an award.

TIP: In this latter respect I will make a proposition to spend some more time on the orientation module, rather than on the intricacies of the color balance module.
Why? Because most images will likely be perceived to be “better” if it complies with basic composition aspects relating to forms and lines and their contrasts and balances, (see any photo book on composition for details). However, part of our perception process is to find meaning in what we see, and in particular if there are faces or other elements of humans and their activities, this is likely catching our attention. By turning images upside down, there is in most images little immediate meaning left that can catch the attention, and hence we are better positioned to see and judge the image at its basic level of forms lines, etc.