Attempt for a x2 image upscaler, using CNN

David_Tschumperle · November 6, 2024, 5:27pm

Hello there,

I’ve took some time today to play with my G’MIC neural network library, with the hope to add a new 2x image upscaler soon in G’MIC.
At this point, I wanted to post some preliminary results and get your feedback. What you think about it?

Details:

The algorithm is based on the use of a small residual convolutional neural network (denoted CNN hereafter), that has approximately 200k parameters.
It is trained on the DIV2K dataset, with a simple L1 loss. Nothing really fancy (i.e. not like a generative model that is a bit harder to set up, maybe in the future ).
This CNN actually works on the luminance channel. It takes a 32x32 image luminance patch as an input, and outputs a 64x64 residual, that is added to the luminance of the linear upscaling of the input patch (doing so allows to add a sharpness parameter that could be set by the user to determine whether the upscaled image is sharp or not).
As I said, there’s nothing outstanding, but anyway I think it does quite a good job in some situations, with quite sharp edges in the upscaled image. It does not recreate texture details though (that would typically require a generative model I think).

Results:

Comparaison with GIMP nohalo and lohalo interpolations:

What are your impressions ?

paperdigits · November 6, 2024, 5:29pm

CNN looks like it beats everything else hands down.

David_Tschumperle · November 6, 2024, 5:32pm

I think I have to compare with more smart interpolation methods, e.g. those available in GIMP (no halo,…).

hatsnp · November 6, 2024, 5:34pm

The edges look a bit unnatural(Statue’s helmet for example) but besides that it looks great.

David_Tschumperle · November 6, 2024, 5:37pm

A quick comparison with the nohalo and lohalo methods from GIMP.

Lowres image:
sample_lr

GIMP nohalo:
sample_nohalo

GIMP lohalo:
sample_nohalo

G’MIC CNN:
sample_cnn

Detail (zoom):

Looks quite good to me!

darix · November 6, 2024, 6:19pm

Do you also have a version with NBC or Guardian? I like CNN but some variety is good!

(scnr)

David_Tschumperle · November 6, 2024, 9:48pm

I wonder if @hanatos could be interested! (he was, by the CNN for image denoising).

hanatos · November 7, 2024, 8:24am

indeed nice stuff! as an application actually i like denoising much more than upscaling… probably because i was socialised with pixel art. i have to say i really like your nearest images… these are incredibly clean! no demosaicing/chromatic aberration artifacts, super nice anti aliasing… i love them

are you still doing the training inside gmic now? for the sake of generic/bulky networks i think i gave up on this idea and just use pytorch (man i hate python, did i mention this before?). i train tiny MLP in my code, because there are some applications that make online training useful.

do you have a pointer to your architecture or can you describe it roughly? from 200k parameters i’m guessing like 20 layers with like 32 channels and 3x3 convolutions? and one output layer that does the upsampling?

the last cnn i worked with does demosaicing and denoising, so it kinda is upsampling too in the last layer. i thought it was a good idea to just swizzle the last 16 output channels into blocks of 2x2 colour pixels… because with tensor cores i will pay for channels in multiples of 16 anyways.

oh also i like to work with f16 accuracy for some speed improvements. backpropagation of gradients can be interesting in this case, i wonder if you do any sort of batch normalisation or gradient scaling? does it “just work” with full floats?

David_Tschumperle · November 7, 2024, 8:45am

Some new comparisons done with other upscaling algorithm, including the GCD Solver filter already available in G’MIC (I must say I’m impressed by this one!, kudos @garagecoder !).
The PSNR is computed with regard to the high resolution ground truth. Higher is better.

David_Tschumperle · November 7, 2024, 9:14am

Yes. I’ve implemented a lot of new stuffs recently to make training more stable (allowing also to train deeper models, but that’s not a very deep model here).

Right now, I can tell you are right I was a bit of a masochist when I got into it, but it also gave me a good understanding of all the nuts and bolts of network optimization, and especially of all the hacks needed to make it work in practice. And in any case, my aim is not to train very large networks.

Yes, basically :

Input 32x32x1 image (luminance patch)
Conv layer with 3x3 kernel → 32 channels (feature map).
Followed by two residual blocks (each being conv+relu+conv+add+relu, all with 32 channels).
Then, an upscale (x2) conv layer (using a stride of 0.5) → output is now 64x64 with 64 channels.
Followed by two residual blocks (each being conv+relu+conv+add+relu, all with 64 channels).
Then, another conv2d layer that simplifies the output to 1 channel). This is the ‘residual’ image.
And finally, this residual is added to a bilinear upscale (64x64) of the 32x32 input patch.
The loss is L1 (sum of absolute values of the pixel difference between output and HR ground truth).

Architecture details:

 * List of modules:
    - Module: IN (type: input)
        * Output size: 32,32,1,1
    - Module: IN_a (type: clone)
        * Input: IN (32,32,1,1)
        * Output size: 32,32,1,1
    - Module: IN_b (type: clone)
        * Input: IN (32,32,1,1)
        * Output size: 32,32,1,1
    - Module: FM_conv2d (type: conv2d)
        * Input: IN_a (32,32,1,1)
        * Output size: 32,32,1,32
        * Properties: kernel=3x3, stride=1, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 320
    - Module: FM (type: rename)
        * Input: FM_conv2d (32,32,1,32)
        * Output size: 32,32,1,32
    - Module: FM_a (type: clone)
        * Input: FM (32,32,1,32)
        * Output size: 32,32,1,32
    - Module: FM_b (type: clone)
        * Input: FM (32,32,1,32)
        * Output size: 32,32,1,32
    - Module: C1_1_conv2d (type: conv2d)
        * Input: FM_a (32,32,1,32)
        * Output size: 32,32,1,32
        * Properties: kernel=3x3, stride=1, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 9248
    - Module: C1_1 (type: nl)
        * Input: C1_1_conv2d (32,32,1,32)
        * Output size: 32,32,1,32
        * Property: activation=leakyrelu
    - Module: C1_2_conv2d (type: conv2d)
        * Input: C1_1 (32,32,1,32)
        * Output size: 32,32,1,32
        * Properties: kernel=3x3, stride=1, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 9248
    - Module: C1_2 (type: rename)
        * Input: C1_2_conv2d (32,32,1,32)
        * Output size: 32,32,1,32
    - Module: pB1 (type: add)
        * Inputs: FM_b,C1_2 (32,32,1,32 and 32,32,1,32)
        * Output size: 32,32,1,32
    - Module: B1 (type: nl)
        * Input: pB1 (32,32,1,32)
        * Output size: 32,32,1,32
        * Property: activation=leakyrelu
    - Module: B1_a (type: clone)
        * Input: B1 (32,32,1,32)
        * Output size: 32,32,1,32
    - Module: B1_b (type: clone)
        * Input: B1 (32,32,1,32)
        * Output size: 32,32,1,32
    - Module: C2_1_conv2d (type: conv2d)
        * Input: B1_a (32,32,1,32)
        * Output size: 32,32,1,32
        * Properties: kernel=3x3, stride=1, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 9248
    - Module: C2_1 (type: nl)
        * Input: C2_1_conv2d (32,32,1,32)
        * Output size: 32,32,1,32
        * Property: activation=leakyrelu
    - Module: C2_2_conv2d (type: conv2d)
        * Input: C2_1 (32,32,1,32)
        * Output size: 32,32,1,32
        * Properties: kernel=3x3, stride=1, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 9248
    - Module: C2_2 (type: rename)
        * Input: C2_2_conv2d (32,32,1,32)
        * Output size: 32,32,1,32
    - Module: pB2 (type: add)
        * Inputs: B1_b,C2_2 (32,32,1,32 and 32,32,1,32)
        * Output size: 32,32,1,32
    - Module: B2 (type: nl)
        * Input: pB2 (32,32,1,32)
        * Output size: 32,32,1,32
        * Property: activation=leakyrelu
    - Module: UP_conv2d (type: conv2d)
        * Input: B2 (32,32,1,32)
        * Output size: 64,64,1,64
        * Properties: kernel=3x3, stride=0.5, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 18496
    - Module: UP (type: rename)
        * Input: UP_conv2d (64,64,1,64)
        * Output size: 64,64,1,64
    - Module: UP_a (type: clone)
        * Input: UP (64,64,1,64)
        * Output size: 64,64,1,64
    - Module: UP_b (type: clone)
        * Input: UP (64,64,1,64)
        * Output size: 64,64,1,64
    - Module: C3_1_conv2d (type: conv2d)
        * Input: UP_a (64,64,1,64)
        * Output size: 64,64,1,64
        * Properties: kernel=3x3, stride=1, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 36928
    - Module: C3_1 (type: nl)
        * Input: C3_1_conv2d (64,64,1,64)
        * Output size: 64,64,1,64
        * Property: activation=leakyrelu
    - Module: C3_2_conv2d (type: conv2d)
        * Input: C3_1 (64,64,1,64)
        * Output size: 64,64,1,64
        * Properties: kernel=3x3, stride=1, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 36928
    - Module: C3_2 (type: rename)
        * Input: C3_2_conv2d (64,64,1,64)
        * Output size: 64,64,1,64
    - Module: pB3 (type: add)
        * Inputs: UP_b,C3_2 (64,64,1,64 and 64,64,1,64)
        * Output size: 64,64,1,64
    - Module: B3 (type: nl)
        * Input: pB3 (64,64,1,64)
        * Output size: 64,64,1,64
        * Property: activation=leakyrelu
    - Module: B3_a (type: clone)
        * Input: B3 (64,64,1,64)
        * Output size: 64,64,1,64
    - Module: B3_b (type: clone)
        * Input: B3 (64,64,1,64)
        * Output size: 64,64,1,64
    - Module: C4_1_conv2d (type: conv2d)
        * Input: B3_a (64,64,1,64)
        * Output size: 64,64,1,64
        * Properties: kernel=3x3, stride=1, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 36928
    - Module: C4_1 (type: nl)
        * Input: C4_1_conv2d (64,64,1,64)
        * Output size: 64,64,1,64
        * Property: activation=leakyrelu
    - Module: C4_2_conv2d (type: conv2d)
        * Input: C4_1 (64,64,1,64)
        * Output size: 64,64,1,64
        * Properties: kernel=3x3, stride=1, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 36928
    - Module: C4_2 (type: rename)
        * Input: C4_2_conv2d (64,64,1,64)
        * Output size: 64,64,1,64
    - Module: pB4 (type: add)
        * Inputs: B3_b,C4_2 (64,64,1,64 and 64,64,1,64)
        * Output size: 64,64,1,64
    - Module: B4 (type: nl)
        * Input: pB4 (64,64,1,64)
        * Output size: 64,64,1,64
        * Property: activation=leakyrelu
    - Module: RESIDUAL_conv2d (type: conv2d)
        * Input: B4 (64,64,1,64)
        * Output size: 64,64,1,1
        * Properties: kernel=3x3, stride=1, dilation=1, border_shrink=0, boundary_conditions=neumann, learning_mode=3, regularization=0
        * Parameters: 577
    - Module: RESIDUAL (type: rename)
        * Input: RESIDUAL_conv2d (64,64,1,1)
        * Output size: 64,64,1,1
    - Module: upIN (type: resize)
        * Input: IN_b (32,32,1,1)
        * Output size: 64,64,1,1
        * Property: interpolation=3
    - Module: OUT (type: add)
        * Inputs: upIN,RESIDUAL (64,64,1,1 and 64,64,1,1)
        * Output size: 64,64,1,1

 * Total: 43 modules, 204097 parameters.

No batch norm here, because I use residual blocks which allows the gradients to be backpropagated pretty much right. I do use gradient clipping though, even if I didn’t test if this was absolutely necessary or not. And I actually use double for the computation during the training, even if I store the resulting weights in float after each iteration.

So really nothing very advanced, with my library I can’t do very complicated stuff anyway

David_Tschumperle · November 7, 2024, 10:14am

OK, so I’ve pushed a new filter Repair / Upscale [CNN2x] in the G’MIC-Qt plug-in for GIMP.

Feel free to test and tell me if that works for you!

paulmatth · November 7, 2024, 11:09am

Hello David, I tested your upscaler with a jpg of an old postcard, file size 295KB, using the default settings. On my 10 years old 4-core pc with 8GB RAM that took 4m38s!

The result is somewhat sharper indeed.
Original (detail):

poet_org

CNN upscaled (same detail):

poet_CNN

I’m afraid I don’t have enough electricity around to feed a 16-bit tif to your filter!

Silvio_Grosso · November 7, 2024, 11:43am

Hello @paulmatth

On my 10 years old 4-core pc with 8GB RAM that took 4m38s!

Aside from this old PC of yours, without also levereging the GPU (it is a “drawback” of G’MIC since its very inception) , I am afraid this new filter (like many others…) will be a bit slow with big images.
Among other things, It may come in handy to upscale jpeg images downdloaded from Internet. Since they are usually small as size this new filter should be fast when upscaling them.

Btw, when yesterday, David posted his attempt to upscale images the very first question which came to my mind was exactly this: how fast this filter will fare with big images (TIFF etc)?

Silvio_Grosso · November 7, 2024, 11:46am

Hello @David_Tschumperle

Just a question, out of curiosity…

Is it possible to develop further this filter to downscale the images or this upscaling underlying “algorithm” is not meant for this process?

hanatos · November 7, 2024, 12:13pm

fwiw using f16 and tensor cores i could run a 24MP input image through a demosaic + denoise CNN in like 50-80ms for about 650k parameters. this wasn’t optimised to the metal and i forgot whether on 2080 or 3080, but GPU anyways.

Pizzacutter · November 7, 2024, 5:50pm

Works really well

The image is only 652×484, but on default parameters it took me just over 1 minute on a 11-th gen quad core mobile i7

lylejk · November 7, 2024, 7:08pm

Results I’m seeing here look awesome, David. Currently use Upscayl and because my GPU (well mine’s an APU; lol) isn’t compatible with the program, it takes a long time to process. Looking forward to the G’MIC release.

lylejk · November 7, 2024, 7:10pm

Hey, Silvio. Give JPR Decimate a try; I think you will be happy. The decimate algorithm is one of the best I’ve ran across as far as downscaling.

Silvio_Grosso · November 7, 2024, 8:11pm

Hello @lylejk

The decimate algorithm is one of the best I’ve ran across as far as downscaling.

Thanks a lot!
I didn’t know this G’MIC filter

Just tested on this jpeg image (4256x2832 px - 300 ppi) with GIMP 2.10.x - G’MIC 3.4.3:

With the default settings of the filter ( JPR Decimate) the process is blazing fast (2-3 seconds to downscale it). The results are great too
Later on, since I was curious, I upscaled (new upscaler filter - using CNN with the default settings) the result of this downscaling (see above);
It took only 2 minutes and 15 seconds. Again very good final result, overall.
Third attempt, was to upscale (again with the new upscaler filter - using CNN with default settings) the original jpeg image (see dropbox - link above) and it took quite a long time…
At 15 minutes, and counting, it was still 50% of the process but I have stopped it (I suppose it would have taken 30 minutes, overall…)

The computer has the following specs:
Windows 11
Processor: 12th Gen Intel(R) Core™ i7-12700H (20 CPUs), ~2.7GHz
Memory: 32768MB RAM
Card name: NVIDIA GeForce RTX 3070 Ti Laptop GPU
SSD drive

afre · November 8, 2024, 7:53pm

How about a noisy image? Such as:

I am still using 10+ year laptops with 4 GB ram lol (maybe one of them is 8 GB ). So none of this will work for me until I replace them.