CLUT compression

David_Tschumperle · March 15, 2019, 7:02pm

I wanted to let you know we’ve just finished writing an article about CLUT compression.
In G’MIC, we have more than 550 CLUTs (Color Look-Up Tables) available to apply on you images, and this is possible thanks to this nice compression algorithm we’ve designed specifically for CLUTs.

The paper title and authors:

David Tschumperlé, Christine Porquet, Amal Mahboubi. An Efficient 3D Color LUT Compression Algorithm Based on a Multi-Scale Anisotropic Diffusion Scheme. 2019. 〈hal-02066484〉

Usually, a single CLUT file takes between 500Kb and 1Mb, and with our technique, we achieve a storage of 552 CLUTs in a single 2.5Mb file (which still surprises me to be honest ).

I thought that, maybe, developers of other image processing tools could be interested by such a technique. It definitely helps sharing CLUT transformations !

You can find a link to the paper (in .pdf) here : https://hal.archives-ouvertes.fr/hal-02066484

Here is a montage of some figures of the paper, just to give you an idea

Let us know what you think about this !

Iain · March 18, 2019, 10:24am

Are there any cases where it is not very effective?

I guess the nature of CLUTs is that they are suited to this type of compression.

David_Tschumperle · March 18, 2019, 11:00am

Exactly.
Generate a CLUT containing only pure noise, and it won’t be very effective.
Fortunately, this is not the kind of CLUTs that has any interest

Iain · March 18, 2019, 11:38am

That makes me wonder how noisy CLUTs can be and if denoising might be worthwhile.

David_Tschumperle · March 18, 2019, 11:41am

I’ve compressed a set of 552 CLUTs for G’MIC, but never seen ‘noisy’ ones.
Discontinuous CLUTs would mean you get discontinuous images from smooth ones, which is not expected in the general case (except for the case the CLUT does some level quantization, but even with that, there would be only a few discontinuities in the CLUT, not all voxel being discontinuous).

David_Tschumperle · March 20, 2019, 1:22pm

Here is the C++ source code for CLUT decompression:

Hebustetram · March 23, 2019, 3:16pm

Very interesting paper.

This is no small feat, you should be proud of that achievement. PS offers a much smaller set of 27 CLUT and those in .cube format are each bigger than your own set.

Moreover in adding an CLUT adjustment layer makes for extremely large psd files. The Haldcluts that G’mic support are easy to store and share but your efficient 3D CLUT compression seems a very powerful solution that would deliver even higher levels of compression while retaining a great fidelity.

Do you plan to provide within G’mic Gimp interface the possibility to compress a CLUT ( e.g. .cube or Haldclut) into your format and enhance your G’mic filter CLUT User-defined to support them too?

Your G’mic Film Emulation Collages is a great tool to use when looking for inspiration as it allows to test what could work well with a new image. Would it be feasible to have in G’MIC Gimp interface a user-defined Collage generator that could apply a series of CLUTs stored in a folder to the image layer?

Sorry for all the questions and congrats for your paper and fantastic work with G’mic.

afre · March 23, 2019, 5:21pm

Using version 2.4.2. I get this error:

PS @partha Could you update your GIMP builds? G’MIC is now 2.5.x.

David_Tschumperle · March 24, 2019, 8:12am

CLUT compression is currently possible only from the command line tool gmic. It’s a quite heavy process to be honest. I’d be really happy anyway if people want to share their cool CLUT, to compress them and include them in the current G’MIC set.

I’ve removed the collage filters in the latest versions, but this is something I need to add again in the new filters, I’ll probably give a try on monday.

Hebustetram · April 28, 2019, 12:32pm

Just noticed that you added collages to all sets of ‘Color Preset’ and ‘Film Simulation’.
Thanks for that, it does make it easier to discover inspiring CLUTs for a picture.

phweyland · May 2, 2019, 4:41pm

Is there a link to access this 2.5Mb file ?

David_Tschumperle · May 2, 2019, 4:42pm

This file: Sign in · GitLab

EDIT : A PNG version of this file is available here : Sign in · GitLab

(basically, one row = one clut).

hanatos · May 2, 2019, 6:39pm

congrats on the paper, this is a nice read. you didn’t end up to do the comparison to radial basis function interpolation in this write up? do you still have any numbers on that? i’d be interested what my trade off options are here like what is the tipping point when the RBF become more expensive or what is their accuracy at same memory usage.

David_Tschumperle · May 2, 2019, 7:00pm

That is unfortunately not possible to do this in a reasonable time using the algorithm we propose to add keypoints. Once the number of keypoints becomes higher than, let say, 200, the reconstruction time of the entire CLUT volume becomes really slow.
And most of the time, we need more than 200 keypoints.

I’ve tested with some “simple” CLUTs that requires only a few keypoints, and I didn’t notice that the RBF reconstruction produces less keypoints in general (sometimes, it does, sometimes not, depending on the CLUT content).

David_Tschumperle · May 3, 2019, 11:19am

Some news:

I’ve discovered FreshLUTs yesterday, and decided it was time to add a few more CLUTs to G’MIC ! So I’ve downloaded some of CLUTs from this website, and compressed them.

Right now, we have then 716 compressed CLUTs available in G’MIC (stored in a single 3Mb file).

I’ve updated the data files on my libclut repository.
And I’ve also added the new CLUTs directly in the filter Colors / Color Presets available in the G’MIC-Qt plug-in.

David_Tschumperle · May 16, 2019, 8:35pm

Update:

I’ve worked hard these last days, trying to improve my CLUT compression algorithm even more. I found a slight improvement, by mixing RBF and PDE reconstruction.
I guess this could interest @hanatos. So I briefly explain the idea here:

Observations:
Basically, what I’ve noticed is that some CLUTs are really well compressed with the RBF reconstruction method (Radial Basis Function), and not so well with the PDE-based method.
But It really doesn’t happen very often. It’s only for a few CLUTS (like <10% or the set of 715 CLUTs I have), that probably have an internal structure where splines can fit the data particularly well
(so, usually, very smooth CLUTs without discontinuities, etc.). Typically CLUTs that looks alike the identity CLUT.

Proposed solution:
So for these CLUTs, the RBF is better, and as in this case, the number of needed keypoints is quite low (generally less than 200), it was a bit a shame to not using RBF for these particular CLUTs.
My proposal, which seems to work reasonably well, is to mix the RBF and the PDE reconstruction of the CLUT, in order to reconstruct a CLUT from its set of colored keypoints, and use this mixed reconstruction method in the compression algorithm I’ve proposed in the paper we wrote (see first post). I’ve tried different ways of mixing the two approaches, and what worked the best is simply a linear mixing:

\text{CLUT}_{mixed} = \alpha\;\text{CLUT}_{RBF} + (1-\alpha)\;\text{CLUT}_{PDE}

\text{where}~\alpha~\text{is the current number of keypoints that represents the CLUT, defined as:}

\alpha = \max(0,\frac{320-N}{320-8})

so it goes from 1 (where N=8, i.e. at the CLUT initialization, to 0 , where N>=320, which is the limit I’ve set. When this limit is reached, the RBF reconstruction is disabled (it takes a lot of time, when the number of keypoints become high).

Results:
I’ve re-run the compression algorithm for the CLUTs that were using less than 1000 keypoints, and there is a small improvement in the size of the compressed data.
We go from 3115349 bytes to 3019013 bytes, so about 96Kb of saving. That’s not really spectacular, but at least I’ve tried something new
All the experiments I’ve made made me realize that on average, there are not much differences in quality between the RBF and PDE reconstruction. Sometimes the former compresses some CLUTS better, sometimes it’s the latter. That really depends on the CLUT structure. The big difference that counts is on the reconstruction time, where the RBF becomes painfully slow when the number of keypoints increases significantly.

That’s all for tonight

David_Tschumperle · May 21, 2019, 9:08am

Update: (probably the last about this topic!).

I’ve made a few other experiments, and finally noticed that in general it’s more efficient not to linearly mix the RBF/PDE approach for CLUT compression.
What is clear to me now is that when the RBF approach works well for one CLUT, it works better alone (not with a PDE counterpart). This is basically due to the fact that the CLUT has been probably designed from tools that are themselves spline-based, so the RBF approach fits the data very well.
For other kind of CLUTs, the PDE approach is more efficient (in terms of computation speed, while keeping the same compression ability).
So what I’ve done is : for all CLUTs that could be compressed efficiently with the RBF approach alone with a small amount of keypoints (which means <320 keypoints, this concerns 255 / 716 CLUTs), I’ve kept the RBF approach. All other are compressed with the PDE approach. Thus, I get a good balance between good compression size and efficient time for the decompression.

This is what is now implemented in G’MIC. This way, I’ve been able to store the 716 CLUTs in a file having a size of 2706237 bytes (2.7Mio), this saves again 300Kb from my previous attempt.

I’ll probably stop my experiments here. I’ve tried literally dozens of compression/decompression variants, tweaking the parameters like crazy, and I’m not sure saving another tens of Kb is worth it.
After all, I’ve already reached a compression rate of 97%+, if we compare the CLUT size with the original data in .png and .cube format (which were already compressed!).
Not so bad

hanatos · May 21, 2019, 10:01am

great, thanks for the update. so you’re saying 320 points is the turnaround point for precision? or for your pain threshold with respect to speed of the implementation? i’m still interested in your perf numbers here.

David_Tschumperle · May 21, 2019, 10:26am

Yes, that only concerns the speed of the implementation.
My RBF decompression is written as a G’MIC script here, so not that fast (but parallelized anyway).
I’m pretty sure a C++ version of the RBF decompression would perform better, but at the same time, I didn’t noticed that for a high number of points (i.e. >300), the RBF approach performs particularly better than the PDE technique.
It works really well in some cases, because of the nature of the CLUT to compress (probably the ones that have been generated following a spline or a linear model). For more complex CLUTs, both RBF/PDE approach give roughly equivalent results (in number of keypoints).

David_Tschumperle · May 21, 2019, 1:15pm

Some numbers, to get an idea.
I’ve 716 CLUTs compressed for now.

Here is the number of keypoints (y-axis) for each CLUT (x-axis), sorted by increasing number of keypoints.

There is a visible limit, at around y = nb_keypoints = 280, where the number of required keypoints for compressing a CLUT becomes high ‘by nature’ (that’s why I chose the threshold to be 320). This is actually independent to the decompression method chosen (RBF/PDE). It says that CLUTs which needs usually more than 280 keypoints to be compressed often requires a lot of keypoints (most requires 1000+). There are very few CLUTs that are efficiently compressed with a number of keypoints between [280,1000].

Here is the histogram of the number of keypoints for the 716 CLUTs.
I’ve used 32 bins, from 0 keypoints to 2047, which means each bin k corresponds to the number of CLUTs that require between 64*k and 64*(k+1) keypoints to be compressed:

As you can see, there are clearly two modes : the “easy CLUTs” (left part), and the “hard CLUTs” (right part), and very few CLUTs exists between these two worlds.

Here are some decompression timing, on a i7 with one or several cores, for a 64^3 CLUT decompression (using 320 keypoints, comparing PDE/RBF decompression)

$ OMP_NUM_THREADS=1 gmic gmic_cluts.gmz k[paladin] rows 0,319 tic decompress_clut_pde 64 toc

→ Single core : PDE=4.65 s, RBF=23.951 s

$ OMP_NUM_THREADS=4 gmic gmic_cluts.gmz k[paladin] rows 0,319 tic decompress_clut_pde 64 toc

→ 4 cores : PDE=1.572 s, RBF=6.761 s

$ OMP_NUM_THREADS=32 gmic gmic_cluts.gmz k[paladin] rows 0,319 tic decompress_clut_pde 64 toc

→ 24 cores : PDE=0.635 s, RBF=1.843 s

(The RBF approach is more easy to parallelize than the PDE one, hence the reduced difference when the number of cores becomes higher).
Both methods are written as G’MIC scripts, so they are comparable in term of timing. It is actually coherent with the theoretical algorithmic complexity they have.