I’m going to copy-paste here the answer I gave by email, because I think it will be useful to more than one guy.
Your camera sensor converts photons into electrons with a piece of semi-conductor underneath the color filter array (roughly 1 photon becomes 1 electron, except for some that get lost here and there – for the sake of this explanation, you can assume 1 photon => 1 electron exactly). It’s really like a photovoltaic solar cell you would use to produce electricity, except the amount of electricity is quite small.
Once we have a current, all we have to do is then to measure the electric current (= the sum of all the electrons passing through the wire) at each photosite. It’s really just measuring how many (micro)amperes we have there, as you would do with a good old multimeter (but much more sensitive).
Using a piece of electronics called an analog-digital converter (ADC), that current measurement is converted to an integer code value inside some range. If you use an 8 bits ADC, your range is [0 ; 2^8 -1] so you encode between 0 and 255. Most cameras use 12 or 14 bits, so they encode between [0 ; 4095] or [0 ; 16383].
These code values don’t mean much in themselves. They only mean that we split the measurement range of the sensor (between noise threshold and saturation threshold) into that many samples so, as the sampling gets finer, your lightness gradients are more continuous and less prone to staircasing effects (called posterization or quantization artifacts). Just imagine you want to represent a diagonal line with a staircase : the more steps you add, the finer the jumps get, and the smoother your line approximation gets.
But these code values are a linear encoding, meaning if you double the amount of light on the sensor, you also double the code value issued by the measurement. That leads to a nice property : doubling the light amount, physically on the scene, or multiplying the code values by 2, digitally in the computer, has the same effect on the picture (if we put the signal/noise ratio aside). Linearity means the data you are working on is proportional to the intensity (or energy) of the light emission.
Mathematically, linearity of some 1D operation f is proven if a × f(b) = f(a × b), which means that you can multiply in the order you want, before or after applying f on b, and the result will not change. We work on RGB (which is 3D), so it’s a tad more complicated, but the same principle holds.
But here, you might smell an issue. Remember human vision is logarithmic and, therefore, non-linear. That means we have increased sensitivity in shadows, and decreased sensitivity in highlights. The “human light increment” is the EV (exposure value). From one EV to another, you double or halve the amount of light (depending which way you go).
So, your camera code values, let’s say in 12 bits, encode the first EV below pure white between 4095 and 4096 / 2 = 2047. It means that half your encoding range is assigned to only the first one EV, below pure white, where your sensitivity is very low. Then, the second EV is between 2047 and 1023, third between 1023 and 511, fourth between 511 and 254, fifth between 253 and 127, sixth between 127 and 67… until the twelveth, which can only take the value 0 or 1. That means the EV zones where you are the more sensitive are the ones that get the fewer code values.
That triggers a lot of problems, the most common being posterization in shadows (staircasing in the shadows gradients). We have 2 ways to deal with that :
-
either ditch the integer encoding and switch to floating point representation, so we don’t care about sampling anymore and we could assume a continuous real encoding in the full range,
-
or redistribute the human-defined EV steps around code values more evenly by applying a non-linear transform (the typical “2.2 gamma” applies a sort of square root, the Lab transfer fuction applies a cubic root, and modern video cameras apply a log directly), so each EV gets roughly the same number of code values (and the first one stops sucking half the values all for itself).
#1. is better to work on pictures, because… it preserves the linear connection between light emisison and code values, so it keeps the multiplication property (along with many more that allow physically-accurate light transforms), and that’s how darktable’s pipeline works, but saving files in 32 bits float is super heavy and quite overkill. To save files, we will rather use #2., which is what modern “gamma” encoded RGB spaces do (Adobe RGB, sRGB, etc.).
So, non-linear RGB spaces are just that : a maths trick to redistribute the code values more evenly between EVs, that should be used only for file saving or to send image buffers to your GPU (and then to you screen). If you plan on working on pictures saved with a gamma-encoding (OETF), you should decode it first, then apply your image operations, then re-encode and save the result. But for some reason, the whole graphics industry has taken the bad habit to work directly on these gamma-encoded files through the whole pipeline, probably because it pegs the 18% middle grey around 50% (0.18^{(1/2.2)}) = 0.46), so it is more convenient to use with levels or curves GUI.
And then, users can introduce non-linear transforms too, even in a linear pipeline, for example by applying a tone curve or a LUT. Basically, every lightness/contrast operator that is not a simple multiplication and/or addition (that is, not an exposure compensation) will de-linearize the RGB, which is fine for creative purposes if you ensured that every physically-accurate transform comes before in your pipeline.
And this is the very reason why I don’t like hiding pipelines to users, because it’s very important to know what operation you are doing on which signal, even if it means they have to wrap their head around non-intuitive concepts, otherwise you could spend hours trying to figure out where those artifacts come from and why you can’t get rid of them.
TL; DR : linear RGB is what comes out of your camera sensor and means the RGB code values are directly (mathematically) connected to the light intensity. Performing multiplications and additions in linear RGB keeps the linearity of the RGB. Anything else turns it into non-linear, which is useful for creative reasons and integer file encodings, but should happen after any operation relying on the physical consistency between code values and light emission, and should be reverted before applying physically-accurate image transforms.
Just be careful, because people usually call “linear RGB” any RGB space free of OETF/gamma encoding (what Elle Beth calls “linear gamma” ), with no care given to what operations have been performed in those spaces. Using linear RGB spaces is only the first condition to preserve the consistency with light emissions, but you also need to ensure that nowhere in you pipe, you apply a non-linear operation on your pixels.
Complement for the geeks : sensors are actually not truly linear to light emissions, and you see that clearly when your scene is not lit by a white-daylight illuminant. That’s why we need better input profiles than the bogus RGB → XYZ 3×3 matrice conversion.