Machine Learning Library in G'MIC

No, it’s really something internal to the neural network library, although adding a new layer may insert a new image into the image list (basically if this layer has parameters to be learnt).

Nice to see so much progress. Do you have more time in your schedule now? :stuck_out_tongue:

There are things I want to try, but as with everything G’MIC, I would like working examples to know what is possible. The ideas are there yet I have only tapped into a few percent of them because I am still not as proficient in the language and math as I would like to be.

You can always ask on the exercise thread though I can only help if there is some clarification and sample on what it is that you want. I wish I could figure out how to solve the complex Popcorn Fractal which is one of my dream filters.

News (2021/07/13):

  • I’ve extended the library architecture to allow the use of different network optimizers.
  • I’ve implemented the Adam optimizer, and honestly, this makes a huge difference in the convergence rate. The loss reaches (minimal) values that I was not reaching with previous (simpler) optimizers.
  • I’ve also implemented a variant of Adam, named Adamax, which doesn’t seem to perform particularly better, at least with the few experiments I’ve tried.
  • I’ve implemented the U-Net architecture, and currently trying to use it for image denoising. So far, not much success, compared to my first trial, but the network is clearly larger (approx. 2M parameters), so it may have to do with the fact I haven’t let it trained enough.
  • I’m experimenting also with a new loss (gradient-weighted MSE) that seems to perform better for image denoising.
  • Plus, the additional code cleaning.

The networks I’m trying are bigger and bigger, and this has a clear impact on the computation time. I’m interested by any help to optimize the CImg convolution/correlation operators (GPU experts, you are welcome).
At this point, this is the main bottleneck I have. Apart from that, everything seems to work fine.

3 Likes

I misread Adam as ADMM, which is also a thing. :slight_smile:

Some (good) news:

I had some time last week and this week-end to make progress on the G’MIC Machine-Learning library, and I’m very happy to announce that I’ve been finally able to set up a first filter that uses ML for the G’MIC-Qt plug-in!

This new filter is simply named Repair / Denoise. It uses a convolutional neural network to denoise images. This filter can be found in the latest developement version of the G’MIC-Qt plug-in (version 3.0.0_pre, posted yesterday at : Index of /files/prerelease). It looks like this at the moment:

It’s a quite CPU-demanding filter, so do not use it if you don’t have at least 4 cores :wink: And even with that, it will take a lot of processing time if your image resolution is large.

There is also an associated command denoise_cnn that can be used from the command line as well (e.g. to batch-process several noisy images):

$ gmic sp colorful,256 noise 15 cut 0,255 +denoise_cnn 0

This example will render this couple of before/after images:

The convolutional neural network used in this filter comes in two flavors:

  • One for processing “soft” noise, trained with images where synthetic gaussian noise has been added in the RGB channels independently (so this is mainly a colored noise).
  • One for processing more “heavy” noise, trained with images where synthetic noise has been added, but this time, in the HSV channels independently.

In both cases, thousands of natural images have been used for the training (I’m actually using the Lorem Picsum webservice to build a training set). The synthetic noise added for the training has random amplitude, so that the network is able to adapt itself to different levels of noise.

The network training has been achieved only by using the functionalities of the integrated G’MIC ML Library, which was quite a challenge!
The neural network is basically a simple ResNet, with 11 convolutional layers (3x3conv, with width varying from 64 to 8). Each of the two flavors of this network has been trained during at least 8 hours.

As these neural network are quite shallow, they have less than 100k learned parameters, which means they don’t require a lot of storage (both networks are stored, compressed, in a 720K file).
The network files are then downloaded directly from the G’MIC server when the command denoise_cnn is used for the first time.

The inference of the network is done “patch by patch” (with patch size 64x64), so image patches can be processed in parallel.

Well that’s it! I’m very happy because all this is the result of hundreds of hours thinking about the design of the ML library structures, learning how neural network training works, implementing the whole stuff from scratch, and finally testing and debugging for hours… But finally, with a result!

A lot of things remain to be done, but for me, this is a first milestone for having ML-based image processing algorithms in G’MIC.

Stay tuned :+1:

18 Likes

This seems like a very good time to keep a close eye on g’mic developments! Maybe some parts will be useable in filters without advanced knowledge eventually? Even without using the ML directly, I noticed there are a lot of new “support” commands and math functions which can be useful too.

3 Likes

Interesting development! I would only recommend to remove the cached file in case of any errors or allowing a forced loading via internet.

That’s great David, thank you!

1 Like

It runs surprisingly fast even on my old laptop - although I haven’t tested with really large images - under 30s most of the time. Perhaps somebody will find the time to do comparisons vs other algorithms, but at least with artificial noise the output is clean (only some faint blotches visible). Seems like a great start :slight_smile:

2 Likes

This is great work that I highly appreciate. Keep on!

I’m deeply impressed on how such a quality tool/framework is freely (as in freedom) available for all of us. Thank you David!

You, your colleagues and your research institute rocks!

P.D: A very interesting filter would be one doing “automatic” background removal or some kind of “shape mask”, maybe providing some guidance with three colors/masks/markers: background, foreground and ambiguity zones (where NN will have it’s primary role).

1 Like

Hello David,

Thank you for the great work.

One question: Would it be useful to have identical photos for the model training, taken with ISO 100, as well as with practice-relevant noise, perhaps ISO 12000?

Then a call should be launched to all photographers to cooperate.

1 Like

Some news: I had some time today to work again on the G’MIC neural network library nn_lib, and I’m happy to announce that I’ve been able to implement a simple neural network for image classification (to classify hand-written digits).

I’ve used the well known MNIST database:

I’ve built a simple classifier network, with 4 convolutional layers, and 5 fully connected layers, that uses a softmax + cross correlation loss (classical loss for image classification).
After a few minutes of training, I get some reasonnable results on the validation set (I’ve not computed the performance of the classifier on the whole validation set, but it looks quite good, like 97-98% good labeling).

Here is an example of automatic labelling I was able to achieve:

(the red square corresponds to a wrong label predicted).

It’s pretty cool to know that nn_lib is now able to build image classifiers, because neural network classifiers are the basis of many cool image processing techniques.
I know that MNIST is actually not a very challenging classification task (regarding today’s standard), but still, I consider this as an important milestone for the nn_lib of G’MIC.

Tonight, at least, I’m happy :slight_smile:

5 Likes

I would say small steps are a big deal when you’re building it from the ground up! I assume you’re learning a fair bit from it too…

1 Like

Seem like a OCR reader can be possible. Imagine loading a full image in G’MIC plugin, then seeing result in text box.

1 Like

When I was in college, I figured out how a clever way to create the arrays for a Hebbian Net on my HP 28S calculator when I was auditing a neural networks course (did my senior design project which involved neural networks so figure I better learn a bit about the subject). Was able to, pretty consistently, detect the numbers 1 through 5 as I recall. lol

Your drawings reminded me of the primitive bitmaps that I fed into the Hebbian Net, David. Of course, I never pursued either programming or neural networks in my short (10 year) professional career (was an Analog guy). Still, enjoyed programming my 28S back then. Created all kinds of widgets from perpetual calendars, to prime number generators as I recall. Had a lot of dead time and geek I was then; not so much now. lol

Thanks for the memory recall. :slight_smile:

1 Like

I do, yes. That’s actually why I didn’t want to rely on an existing library (despite GPU optimizations I’m missing). Also, particularly true because the lib is written in my own language, using my own math evaluator :wink: A real bug hunter!

Sorry @Joern_E , I haven’t seen your question.
The answer is: yes. Definitely. If we are able to build a nice database, then the denoising process should be better than what I already did (with synthetic noise).

Thanks for working on this and sharing your journey with us! I have a PhD in computer vision so I’ll be happy to lend a theoretical hand if you like :slight_smile:

Couple of comments regarding the above:

  1. The results you get on MNIST are pretty good! Perhaps you could try now on CIFAR-10 (or even CIFAR-100). This is still considered a toy dataset, but at least it has colour images so it would get you a step ahead in designing the network
  2. I wonder why you used 4 convolutional layers and 5 fully connected layers. Typically CNN are designed the opposite way, i.e. more convolutional layers (which are invariant to translation and scale shifts) and use only one or two fully connected layers as classifier
  3. A very powerful architecture for denoising is the DAE (Denoising AutoEncoder). The idea is to learn a compressed latent representation that encodes all the semantics needed to reconstruct the corrupted input image.

Implementing an entire CNN framework in your library is on its own a behemoth task, so kudos to that!