A sensor turns some physical phenomenon (light, sound, etc.) into an electrical current. Depending on the sensor type, one of the quantifiable properties of that electrical current is the image of the measured phenomenon, through a mapping law:
- either current (in ampere) ; in French, we call it intensity, because it’s less confusing.
- or tension (in volts).
This electrical image of the original phenomenon is called the signal. The mapping law between signal and original phenomenon can be determined by calibration (called profiling in image processing, because guys felt like inventing metrology again).
Every measurement is done with some error because no sensor is perfect. That error, once quantified and statistically analysed, is turned into an uncertainty. We therefore express the result of the measurement as (reading ± uncertainty) unit. It means that the true value can lie anywhere in an interval bounded by (reading - uncertainty) and (reading + uncertainty). We can manipulate the uncertainty so we are 88, 95, 98 or 99% confident the true value is in this interval (through the normal law and the stats we made), but we have no way to say exactly where.
Every sensor can operate in a certain range where its measurements results are deemed valid. This range is bounded by the saturation level, for high values, and the noise level, for low values.
Data is the part of the recorded signal that makes sense in the current context. For example, imaging sensors can record valid signal with little uncertainty for light outside of the human visible spectrum. They yield valid RGB values, yet, this is non-data (in the context of consumer photography), because it is non-color. That’s where humans take over technics and exercise their judgment over some signal to decide if they trust it or not.
The noise can be seen as some random quantity, taking random values bounded by the uncertainty, and added on top of the theoretically clean signal. But that’s only a model to represent things. Other models represent it as a value multiplying the theoretically clean signal. Depending the nature of the noise you are dealing with, one or the other applies best.
For each pixel, we don’t know exactly how much noise there is, but we can predict it will be inside some bounds. So, |noise| \leq |uncertainty|, and let’s say any individual reading = true\,value + noise, so true\,value lies between (reading - uncertainty) and (reading + uncertainty) with 98% of confidence.
If |reading| \leq |uncertainty|, then it means true\, value \leq noise \leq uncertainty, so the true value you are trying to measure is below the uncertainty of your sensor. How could you assign meaning to a measurement result that is more “precise” than the uncertainty of your sensor itself ? For this value, your error bounds are equal to at least 100% of the true value. So… is that not enough to call that data clipping ?
Having a signal doesn’t mean you have data. There is a science that aims at giving meaning to signals, it’s metrology.
By the way, see here what I did ? All the way, we are working with bounds. We don’t know the true value, and we can’t know it, but we know for sure it’s inside some bounds we can express with a good level of confidence. All we know are bounds.
However, thinking in terms of probability intervals is not the same thing as trying to hit the bull’s eye of some convenient target. It’s like driving alone on a very large road with no central line. You see the left border, the right border, you know you need to stay in-between while avoiding cars coming from the other way, but can you tell exactly where the middle is ? And do you even care ?