I think it’s “instead of” instead of “yet another”.
Now, by way of disclosure, my total exposure to mathematics in the pursuit of three college degrees is four. Four math courses. Early on when I was young and stupid, I didn’t care much for math and tried to avoid it. Now that I’m only stupid, I at least see the err of those ways… @shreedhar, the following is my take on all this, kinda in response to your take.
Anyhow, when light is captured, it’s done through the R, G, and B filters of the bayer array, so what’s measured is an intensity of a jumble of wavelengths necked down by a bandpass filter. That’s what the camera knows, a set of light intensity measurements. The ‘color’ is about the bandpass filter. So, these measurements aren’t XYZ or even RGB, they’re energy intensities.
Demosaic is (I think) a statistical assertion, what a thing called color might be at a pixel location, based on the intensity measurements made there and in the surrounding locations. At the end of this dance you now have an array of statistical assertions, or as we usually refer to them, RGB triplets called pixels.
The camera’s ability to resolve light intensities determines its “spectral response”, how reliable are its measurements of light intensity across the spectrum we’re interested in, visible light. This is communicated in the matrix we’ve been discussing, along with a triplet that expresses the camera’s white point reference. The 3x3 matrix are the reddest red, greenest green, and bluest blue color we can expect the camera to resolve through this measurement and statistical gonkulator (look it up) described above. Youll see them referred to as the “RGB primaries”. Those are usually expressed in that XYZ system, I’ll leave that to a subsequent discussion. This matrix and triple are calculated from an image made by the camera of a specifically designed target; in my previous post this work was done for me by scanin and colprof, tools that are part of the excellent Argyll CMS package written by @gwgill.
Once the RGB image is produced, you can attempt to look at these numbers on your screen, but they won’t look like what you regarded in the scene because the gamut of the display is so much smaller than the camera’s spectral response. It’s like listening to Rachmaninoff on a transistor radio; you know there’s a lot of aural richness in the room, but all you’re getting is “plink-plink-bong” through the radio’s 1-inch speaker. The purpose of that matrix and triple is to provide the information to the gonkulator that will map the richness captured by the camera eventually to the oh-so-limited display medium.
If you examine the chromaticity chart posted by @anon11264400, you’ll see dots plotted along the routes between the red, green, and blue primaries (which are for a particular, non-specified camera) and that camera’s white reference. This is a good illustration to describe how the mapping of color is done; it’s essentially (and maybe too simplistically) a lookup of the appropriate number along the line radiating from the reference white through the R, G, or B primary coordinate. The white coordinate anchors all these lookups; if white isn’t properly characterized, these lookups will produce different numbers.
(Now, the rest is a bit of speculation on my part, based on the discourse of this thread…)
So, the average photographer doesn’t want to shoot a target at each location, which would be the most reliable way to capture the camera’s response to the light at that location. Accordingly, the prevailing convention for characterizing camera responses is to shoot the target in daylight, and assume the photographer will do a separate correction if they don’t like the colors they see. That’s the whitebalance, usually expressed as multipliers to be applied to each R and B (G is usually used as the anchor to which the other channels should be moved). @gwgill’s point is (I believe) that it’s better to use the white point for what the camera actually sees, rather than munging with the data after the fact.