In the real world, a shadow are perhaps has light values of 10 to 20, while a sunlit area has a light values of 1000-2000. We actually see the 10 steps between 10-20 as similar to the steps 1000, 1100, 1200, …1900, 2000. This menas in the bright area, many more steps are available than needed.

A camera sensor is linear. It records the light value, so to speak. This means you need a 12 or 14 bit value to store an image.

Computers used to be 8 bit - values from 0 to 255. ‘Gamma’ is a useful mapping that makes sure each step is just noticeable on screen and no steps are wasted coding differences that can never be noticed. (Actually gamma was useful for alalog TV - noise is a constant error in the signal, and by having a gamma, light and dark areas are evenly affected by noise. Without gamma the dark areas would be very noisy.

Now, calculating in linear space means calculations are done with ‘real light’. If a spot is in focus, a single pixel may have value 2000. If the spot is out of focus and evenly covers 4 pixels now, each pixel will have 500 as value. If sharpening is applied, it can recover the 2000 for the single pixel.

If the calculation is done in a non-linear space, the in focus value might have been 200. Now spread it over 4 pixels by defocusing the lens - and there are likely 4 pixels with value 100. Now do a mathematical blur, and there would be 4 pixels with value 50 (which would be too dark).

This means that calculations do not come out right.

Note that for displaying, the values in the calculation are transformed into screen values. If the image is in linear space, the gamma curve will be applied to display the image correctly on the monitor. Thus you have correct calculations and correct display.

If the image is non-linear, the calculations will be incorrect. The image will be correctly displayed of course, by sending it directly to the monitor.