Wayland color management

The LUT is just a function and the input to that function will be affected by the pipeline before it. We either can not ever change anything in the complete pipeline or we have some sort of quantization error that you cannot measure.

Is that error significant? I doubt it. Would be great to know for sure (going to ask someone working for Intel, maybe he can share some details; if that doesn’t pan out an experiment is going to be necessary).

So at least on Intel hardware the frame buffer value gets converted to 16/24 bits per channel (depending on the hardware) and all of the plane/crtc degamma/csc/gamma calculations are done with that bpc. Apparently the limited size of the LUTs (seems to be 8 bpc) introduces loss of precision. To handle HDR content dagamma/gamma have a special modes (Intel GFX - Patchwork) for segmented LUTs and up to 12bpc.

I’m now even more convinced that exposing “the” LUT is a bad idea and that quantization errors in the dynamic pipeline are not a problem.

Have you asked the Intel egineers about this, since we are going to disagree here anyway. I still think it is the compositors task to manage whatever “the” LUT (as you so dismissively call it) and thus for consistency reasons it should also be set by the compositor during calibration/profiling. Only the compositor needs to be aware how and where it actually gets set and how many bpc it actually has but the compositor needs to set it also during calibration/profiling.

Hell if there is no standard way to do this almost all compositors will give access one way or another just not in a consistent manner for example wlroots and kde, so people are already doing it and it is better to provide a standard. And IMSNHO opinion the protocol I created is a lot saver then what wlroots and kde are doing (since it can only be changed during calibration instead of any time)

Yes.

I do understand why you want to do that. The problem is that this won’t get you anything consistent either. The hardware you load the vcgt tag into is different on different hardware. Just on Intel alone you have 16/24 bpc pipelines with 8/10 bpc LUTs, segmented LUTs, etc. Change your GPU and suddenly your profile is invalid.

That doesn’t really strike me as a significant added problem since that may be the case anyway.

Gnome has their own implementation of it, so that’s in no way any better.

I think the point was that precisely because it’s not available everywhere (i.e. part of a standardized protocol), bad “solutions” like gamma-control.xml pop up.

The ‘vcgt’ and hardware are not necessarily supposed to align, i.e. usually a ‘vcgt’ tag contains 3x256 16-bit unsigned integer values (other integer formats and counts are possible, but much less commonly used), which then (on systems where this is implemented) can be loaded via an API (the API usually doesn’t expose any internals). This may lead to quantization to a different bit depth (i.e. likely < 16 bit), which can introduce artifacts like banding on otherwise smooth gradients. The proper way a hardware solution would handle this is apply the high bitdepth LUT to its limited bit depth frame buffer using dithering (that’s the way AMD has been doing for over a decade). This could also be done in a shader if the hardware doesn’t otherwise support applying LUTs with dithering.

First one to show me that this is a real problem and not an imaginary one gets a crate of beer (or other beverage of their choice) :slight_smile: This could have been a problem back in the days when analog (e.g. VGA) connections were the norm, because changing the video card (edit: or even just the cable) could affect the signal (after D/A).

Sure.

I mean, yeah. Bad solutions are easy to come up with, good solutions not so much.

Exactly. Different hardware does use different bit depth (or at least can use different bit depth) so swapping out the GPU actually can produce different results.

That sounds wrong. The frame buffer values get promoted to a higher bpc value (16/24) before the LUT gets applied, the LUT values have lower bpc (8/10/12), the result is higher bpc values again which might get dithered at the last step. Dithering doesn’t solve the problem of the LUT being low precision it only retains better visual quality when going from a high bpc image to a low bpc image.

(at least for intel hardware but it makes a lot of sense since having higher bpc in the pipeline is cheap while extra precision for a LUT is expensive)

Creating different amount of banding (maybe even depending on the channel) should be the result here. If it’s not noticeable in the real world I’d be very happy since it means the consistency argument from @dutch_wolf is moot.

edit: Actually, maybe not banding but just slight inaccuracies if the LUT gets interpolated linearly to the precision of the pipeline. Not sure here.

Who switches his GPU after he profiled his pipeline? This is about the calibration process. If I change the GPU I will redo calibration.

Per component? I somehow doubt it. Final output buffer cannot be higher than what the connected display supports at the input side, so should usually be 8 bpc with most consumer displays (24 bit with three components, RGB), but of course it depends on the connection and capabilities of the display device (DVI 8 bpc, HDMI/DisplayPort up to 12?). When and where the LUT gets applied also plays a role, but these are internals.

The LUT (as in ‘vcgt’) being low precision isn’t usually the case. A problem is when the high precision LUT gets quantized down to (say) 8 bpc without dithering, e.g. it getting applied in high bit depth but the end result not using dithering down to output depth but simply rounding/truncating etc.

And it is. That is not “invalidating the profile” though, as the banding artifacts are just visually annoying (grayscale tracking may suffer slightly as well though).

Yes.

Well, that’s what I get told from Intel and it very much makes sense.

Sure it can. If you want dithering it must be.

My bad. I meant the hardware degamma/gamma LUTs.

Well, you cannot dither a LUT. You can only dither an image. You can obviously apply a LUT before dithering but, and that’s the point I tried to make, applying a low precision LUT gives a low precision result. Dithering a low precision image doesn’t magically get you higher precision.

Have you actually seen this? Under which circumstances?

The only reason it surprises me is because it then seems Intel doesn’t make much use of the increased bitdepth.

Sure it can. If you want dithering it must be.

At some point you have to deal with the higher precision values yes. What I was getting at if a display (say) only accepts 8 bpc on input, we cannot send it (say) 10 bpc. So we dither down to 8 bpc before the image goes to the display. The display itself of course may do its own dithering as well if the panel (e.g.) is 6 bit + FRC or similar.

My bad. I meant the hardware degamma/gamma LUTs.

Ok. If these are limited in precision there’s not a whole lot you can do, other than not using them.

Dithering a low precision image doesn’t magically get you higher precision.

That wasn’t my point. If you have (say) a 16 bpc LUT and an 8 bpc image, that’s effectively a higher bitdepth image, and that is what gets dithered.

Have you actually seen this?

Banding? Sure. That’s not really hard to see when using test images, a grayscale gradient will do, or something with a natural gradient like a photo blue skies etc.

The only reason it surprises me is because it then seems Intel doesn’t make much use of the increased bitdepth.

You have to deal with the accumulated error of 4 LUTs per channel, 2 matrix multiplications, blending and still have enough precision left to feed a 12 bpc signal maybe even with dithering.

Right.

The reason why you absolutely want to use them is that you get them basically for free instead of using shaders which increases latency, increases the memory footprint, occupies shader units and bandwidth.
What I do not get is why you’d want to use the hardware LUTs instead of using a LUT of floats on the CPU, calculate the result, round it to the frame buffer format and send that to the display with all LUTs, matrices etc disabled for calibration and profiling.

If the result of applying the LUT is 16 bpc, yes, got it.

I meant banding specifically because of a low precision LUT or the lack of dithering. Probably really hard to figure out what exactly is going on and where exactly the culprit lies.

Sure, but you’ll likely need shaders for the color management stuff anyway, because it’s basically the only way to apply 3D LUTs with good performance. I don’t have a strong opinion on this though.

What I do not get is why you’d want to use the hardware LUTs instead of using a LUT of floats on the CPU, calculate the result, round it to the frame buffer format and send that to the display with all LUTs, matrices etc disabled for calibration and profiling.

What I personally want or not isn’t so important because I’m ultimately dependent on what gets implemented in ArgyllCMS (or not). For my current GNOME under Wayland workaround, I’m doing something similar to what you’re describing: Using dithered 8 bpc images created on the fly from a solid color (in float) displayed in a normal UI window during measurements. What the hardware later does with the result is another question, because GNOME seems to load the resulting profile’s ‘vcgt’ tag by setting the ‘videoLUT’ (for lack of better term) of the graphics hardware, so at this point quantization is no longer controlled.

I meant banding specifically because of a low precision LUT or the lack of dithering.

If by ‘LUT’ we mean ‘vcgt’ then yes. And by lack of precision “8 bpc instead of anything higher” then yes. AFAIK the easiest way to test for this is the nVidia binary driver (at least under Linux), because you can turn dithering on and off in the control panel. Of course what exactly happens under the hood is an unknown.

1 Like

There is lots of setups which don’t have profiles with 3D LUTs (auto generated from EDID information for example) and even then you can offload the linearization (degamma) to the hardware per channel LUTs which is required for proper blending.

Right, makes sense.

Interesting.

Color consistency is not the only consistency I am talking about albeit since it is an important one it has been the focus of this discussion. I am also talking about consistency in implementation, for an explanation of what I mean lets go back to basics.

Firstly lets examine what happens when a normal color managed application tries to send pixels to the screen, it will first convert these pixels to the output color space (using the provided ICC profiles, how this is done doesn’t matter here) it will then send this buffer in the output color space to the compositor who will apply an identity transform (since it is already in output color space) and then potentially a calibration curve (either as shader or having it loaded in one of the many available gamma tables, sometimes not needed since the monitor is that nice or has its own LUT).

Note that here the calibration curve is separate from the profile (it is often stored in the profile but theoretically it doesn’t have to), this should be a clue that profiling (creation of the profile) and calibration (creation of the calibration curve and/or setting monitor settings potentially by the user) are two conceptually separate things that are often (to the confusion of a lot of people) combined in one step.[1] That said those two things don’t have to be combined in one step and it can be advantageous to separate them out since sometimes you can get away with only calibration or only profiling.[2]

When doing both (which will be the most common situation) we first start with calibration for the sake of argument I will leave in the middle how the calibration curve is handled during calibration.[3] It is with the next step profiling that it gets interesting since to build the profile we will want to do that on the fully calibrated pipeline since that is what the profile to be created will assume. So either at the end of the calibration step or the beginning of the profiling step we will need to upload the calibration curve to the compositor and that is what all this is about (AFAICT/IMHO).

Now how this is done is less important and seeing what modern HW is actually capable off I don think a calibrator needs to know the exact size of the LUTs available (since whit shaders we can emulate any LUT size we want).


[1] Although in ArgyllCMS it is provided by 2 different programs dispcal to calibrate dispread to profile (there is also targen involved to determine which patches to chow for dispread)
[2] Very expensive screens often only need calibration and then just tell the program that it needs to use sRGB/AdobeRGB/DCI-3P, this is what OpenColorIO is build around (at least v1, v2 will allow to load ICC display profiles IIRC). Less expensive screens might still behave nicely enough that calibration won’t improve the profile by a lot so only need calibration if a color space change is needed. Cheap screens probably need both to behave somewhat sanely.
[3] If needed at all, think for a really nice one some iteration would be necessary but for a basic one just measure and then calculate one from that measurement should be fine.

1 Like

Why? This is what this whole conversation is about. Why does the compositor have to have the LUT? Why can’t the client just apply the LUT in software and then write the result to a frame buffer?

Well, how will the calibration 1D LUT be applied later, when both profile and calibration are actually in use? In software or in hardware? And when you answer that question, think about consistency. In my opinion, it makes little sense from a consistency standpoint to apply the calibration in software only during profiling measurements, when it is clear that later it will be applied in hardware.

[ Besides, applying the calibration in hardware during profiling measurements is already possible (in GNOME under Wayland) if one really wanted to: Instead of calling colord’s device.ProfilingUninhibit() at the end of profiling measurements, one would call it right after the calibration measurements are finished, add the calibration as ‘vcgt’ to a profile with Rec. 709 primaries and sRGB tone curve (when full desktop color management arrives, this should in theory still result in an identity transform), install that (again using colord), which then loads the calibration, and then do the profiling measurements with calibration in place. The reason I chose not to was simply because it would have required more work and time for testing. ]

1 Like

However the compositor sees fit.

But it is not clear that it will be applied in hardware and even less so that it will remain unchanged (i.e. multiple LUTs converted to a single LUT).

Because the client doesn’t know the LUT? With the sole exception of the calibrator (which creates the calibration curve) there is no need for any other client to know the calibration curve and yes this does include the profiler.

And also this:

Since only the compositor can know that this is possible it is the compositors responsibility and once it is the compositors responsibility it should always be in the hands of the compositor so no exceptions for profiling (or even calibration IMHO)

Sigh. That was only an example. Just replace “software” with “method A” and “hardware” with “method A, B, C or D”, and it should be obvious there is no consistency if the only situation where “method A” is always chosen is during profiling. I don’t even care that much, it just seems like an obvious awkward inconsistency.

1 Like