Machine Learning Library in G'MIC

David_Tschumperle · November 23, 2024, 5:51pm

Hello there.

A quick note, to give you some news about my long-term project to create a machine learning library inside G’MIC

Over the past few weeks, I’ve been trying to get back into it seriously, and I have to say that I’ve made a lot of progress!

First, I spent a lot of time debugging and completely recoding the convolution module for the neural networks (as well as the convolution itself in G’MIC), and I’m happy to announce today that I’m pretty sure this module works properly inside nn_lib, including convolution with stride, dilation and shrink (before it was working well only with the basic settings stride=dilation=1).

Convolution modules are particularly important and used in neural network architectures (esp. for image processing), so checking that I have something that works as the theory says was necessary.

I’ve also recoded a specific normalization module from scratch, that appears to work nicely, so that I can manage bigger networks than before (normalization modules help stabilizing the training process, so it’s almost mandatory to put some of these when designing deep neural networks).
Then, I’ve implemented and experimented with various Upscale layers : transposed convolution, pixel shuffle, resizing with various interpolations. Upscale layers are used for instance in auto-encoders and U-net architectures, i.e. networks that can generate image data. Using these upscale layers, I’ve been able to train a simple image upscaler (see my post here for more details Attempt for a x2 image upscaler, using CNN). I have a project to train a specific image upscaler for lineart images, with the help of David Revoy. I’ll tell you more when I got the data
I’ve also made the changes in the nn_lib so that the library can handle several networks at the same time. This will be useful in my next attempts : training GAN architectures for generating details in images (e.g. in the upscaler).
So far, I’ve been successful in using the nn_lib to train classifiers (MNIST, MNIST-fashion, Face detection, Celeb-A attributes), up to 10M parameters (the biggest is a Resnet variant I’ve made for the Celeb-A dataset). I’ve also been able to train a simple auto-encoder and a small U-Net, so this means future possibilities for pretty cool filters in G’MIC (like better upscaler, automatic image colorization, neural style transfer, …).
Next step is to be able to train a GAN to see if the lib can work flawlessly with two networks at the same time (generator / discriminator). But I’m pretty confident it will work one day or another

That’s all for today, but I wanted to share these advances with you. It’s now been 3 years since I started implementing this library, completely from scratch, and I’m starting to see some real and interesting possibilities for G’MIC users!

Having started from scratch, I must say I’ve learned a lot about the algorithms (and all the tricks) used for machine learning, so it’s not all wasted anyway
But if, on top of that, it results in filters that everyone can use, then that’ll be great!

I’ll tell you more when I get new results!