Huffman compression, and compression artefacts

David_Tschumperle · April 24, 2023, 8:25am

Hello there.
I was a bit bored yesterday so I’ve decided to try implementing Huffman coding directly in G’MIC language. This is a classical non-destructive compression algorithm.

It’s not really because I wanted to compress things (we already have the native command serialize for this purpose, and it uses a smarter algorithm), but because I wanted to test how easy/hard it is to manage tree structures inside a G’MIC command, and Huffman relies on the construction of a binary tree, and the algorithm is simple enough. So that was a good occasion!

(side note: I’ll may have to rely on tree structures later to optimize my WIP raytracer…).

What is fun is actually to add some bit-inversion noise in the compressed Huffman code, before decoding : it creates nice compression glitch effects. Could be even better doing this patch by patch, this clearly gives some ideas for a future Huffman Glitch filter !

For those who are interested, here is the code:

Huffman coding in G'MIC

# Simple implementation of huffman compression, in G'MIC.
# $1: Number of bit inversion done in the compressed code.
huffman : check "${1=0}>=0"
  sp colorful,480 n. 0,255 round.
  W,H,S:=w,h,s y => input

  # Bottom-up construction of the Huffman tree.
  siz={input,whds}
  +histogram[input] 256,0,255 1,1,1,6 eval.. "begin(id = 0); i?da_push([id++,i,x,-1,-1,-1])" rm..

  # -> list of active leafs/nodes ( id,occurence,value,parent,child1,child2 )
  eval "
    active = 0;
    id = da_size();
    do (
      # Find the two minimal occurrences in active nodes.
      omin1 = omin2 = inf; oind1 = oind2 = 0;
      for (k = da_size() - 1, k>=active, --k,
        occ = i(0,k,0,1);
        occ<=omin1?(oind2 = oind1; omin2 = omin1; oind1 = k; omin1 = occ):
        occ<=omin2?(oind2 = k; omin2 = occ);
      );
      oind1>oind2?swap(oind1,oind2);

      # Merge them as a new active node, and set children as inactive.
      i(0,oind1,0,3) = i(0,oind2,0,3) = id;
      da_push([ id++,omin1 + omin2,-1,-1,i[oind1],i[oind2] ]);

      swap(I[oind1],I[active++]);
      swap(I[oind2],I[active++]);

    ,_(while) active<da_size()-1);
    da_freeze()"
  sort. +,y channels. 2,5 => tree # ( value,parent,child1,child2 )

  # Construct codebook for faster encoding.
  1,256,1,2 # codebook ( code,nb_bits )
  eval[tree] "i2<0?( # Leaf
    ind = y;
    code = nb_bits = 0;
    while (ind>=0, # Iteratively go up to root node
      par = i(0,ind,0,1);
      par>=0?((code<<=1)|=(i(0,par,0,3)==ind); ++nb_bits);
      ind = par;
    );
    ncode = 0; repeat (nb_bits,(ncode<<=1)|=(code&1); code>>=1); # Reverse bit order
    I[#-1,i0] = [ ncode,nb_bits ];
  ); I" => codebook

  # Encode input data.
  0 eval[input] ">begin(out_stream = 0; nb_bits = 0);
    val = i;
    C = I[#$codebook,i];
    (out_stream<<=C[1])|=C[0];
    nb_bits+=C[1];
    while (nb_bits>=8, # Write completed bytes
      da_push(out_stream>>(nb_bits-=8));
      out_stream&=2^nb_bits-1;
    );
    end(nb_bits?da_push(out_stream<<(8 - nb_bits)); da_freeze())"
  => compressed

  e "  > Size(original) = "$siz", Size compressed = "{compressed,h}

  # Add bit artefacts in compressed code.
  eval "
    repeat ($1,
      ind = round(u(0,h#$compressed - 1));
      bind = int(u(0,8))%8;
      mask = 1<<bind;
      val = i[#$compressed,ind];
      i[#$compressed,ind] = (val&mask)?val&(255 - mask):val|mask;
    )"

  # Decode compressed data.
  0 eval "
    off = mask = byte = bytes_written = 0;
    ind = h#$tree - 1;
    while (bytes_written<$siz,
      !mask?(byte = i[#$compressed,off++]; mask = 128);
      bit = (byte&mask)!=0;
      mask>>=1;
      child = i(#$tree,0,ind,0,bit?3:2);
      i(#$tree,0,child,0,2)<0?( # Leaf
        da_push(i(#$tree,0,child,0,0));
        ind = h#$tree - 1;
        ++bytes_written;
      ):(ind = child);
    ); da_freeze()"
  => decompressed

  r[input,decompressed] $W,$H,1,$S,-1

  +-[decompressed] [input] norm. e "  > Reconstruction error : "{ia} rm.

It’s not that long (and of course a bit slow compared to a C++ implementation, but still usable for compressing/decompressing images).

Here are the results of adding bit-inversion noise in the compressed code. First image has 0 noise, then I gradually increase the level of noise.

No noise:

huffman0

100 bits inverted:

huffman100

1000 bits inverted:

huffman1000

10.000 bits inverted:

huffman10000

100.000 bits inverted:

huffman100000

I’ll probably try to clean the code and create a filter from it in the next days.
Feel free to play with it!

hanatos · April 24, 2023, 8:52am

these images look truely amazing. love your new filter already

would advise against writing a ray tracing bvh from scratch in this day and age. after all the years that so many brilliant engineers spent optimising this to the metal, you will not beat the quality structures built by embree. i still kinda maintain my own just for the sake of not depending on other people’s bloaty codebase… but i’ve clearly been outperformed over the years. not to speak of employing hardware units on today’s gpus.

David_Tschumperle · April 24, 2023, 9:02am

Sure, I’m doing this for fun, to see what I can achieve. Probably not that much, but maybe it will save me from spending evenings being bored

David_Tschumperle · April 24, 2023, 9:26am

David_Tschumperle · April 24, 2023, 3:38pm

An attempt to do the same glitch process patch by patch.
Not as interesting as I thought it would be.

gmic_000000

Reptorian · April 24, 2023, 3:57pm

I like it. Could you experiment with color spaces?

David_Tschumperle · April 24, 2023, 4:36pm

Here, in HSL8 colorspace (normalized HSL), with random amount of noise for each image patch (64x64 with an overlapping of 20%).

David_Tschumperle · April 26, 2023, 8:59am

I’ve uploaded a new filter Degradations / Huffman Glitches this morning, so everyone with G’MIC 3.2.0+ should be able to test it and report issues, if any.

It’s still a bit slow to compute (mainly because the decompression step is slow), but already usable so I’ve decided to make it available anyway.

Let me know if you succeed doing something interesting with it

samj · April 26, 2023, 12:58pm

@David_Tschumperle

To have fun with this new filter: a frame.

Update

gmic sp dog +fx_huffman_glitches 50,0,25,0,0,0,1234 resize. 120%,120% blur. 2 c. 0,192 rv blend alpha c 0,255 o test_frame_huffman_glitches.png

David_Tschumperle · April 26, 2023, 1:53pm

Nice!

Already done some optimizations, so now, it now decompresses different parts of the image (patchs or horizontal/vertical blocs) in a multi-threaded ways when possible.

David_Tschumperle · April 26, 2023, 2:45pm

G’MIC command apply_video turns to be so useful for this filter

mbs · April 26, 2023, 4:01pm

This video is actually a great example of the different way noise and errors affects analog vs digital communications. I used to do something similar (introducing random bit flips in an mp4 video or in mp3 audio) in my courses on digital communications.

samj · April 26, 2023, 7:41pm

To have fun with the image of this dog:

Edit 20230427

Windows terminal

gmic sp dog La={w} Ha={h} rotate 45 +fx_huffman_glitches 30,1,2,0,0,0,1234 fx_huffman_glitches.. 30,2,2,0,0,0,1234 blend average rotate -45 DL={{{w}-$La}/2} DH={{{h}-$Ha}/2} crop $DL,$DH,{$DL+$La},{$DH+$Ha} c 0,255 o dog_45_huffman_glitches.png

Linux terminal (thank you to David Tschumperlé for the explanations)

Create a text file ‘TEST.gmic’ with this content:

RAYURES : La={w} Ha={h} rotate 45 +fx_huffman_glitches 30,1,2,0,0,0,1234 fx_huffman_glitches.. 30,2,2,0,0,0,1234 blend average rotate -45 DL={{{w}-$La}/2} DH={{{h}-$Ha}/2} crop $DL,$DH,{$DL+$La},{$DH+$Ha} c 0,255

Then this command line:

gmic m TEST.gmic sp dog RAYURES o dog_45_huffman_glitches.png