Machine Learning Library in G'MIC

Joern_E · September 18, 2022, 5:04pm

Hello David,

Thank you for the great work.

One question: Would it be useful to have identical photos for the model training, taken with ISO 100, as well as with practice-relevant noise, perhaps ISO 12000?

Then a call should be launched to all photographers to cooperate.

David_Tschumperle · September 26, 2022, 8:48pm

Some news: I had some time today to work again on the G’MIC neural network library nn_lib, and I’m happy to announce that I’ve been able to implement a simple neural network for image classification (to classify hand-written digits).

I’ve used the well known MNIST database:

I’ve built a simple classifier network, with 4 convolutional layers, and 5 fully connected layers, that uses a softmax + cross correlation loss (classical loss for image classification).
After a few minutes of training, I get some reasonnable results on the validation set (I’ve not computed the performance of the classifier on the whole validation set, but it looks quite good, like 97-98% good labeling).

Here is an example of automatic labelling I was able to achieve:

(the red square corresponds to a wrong label predicted).

It’s pretty cool to know that nn_lib is now able to build image classifiers, because neural network classifiers are the basis of many cool image processing techniques.
I know that MNIST is actually not a very challenging classification task (regarding today’s standard), but still, I consider this as an important milestone for the nn_lib of G’MIC.

Tonight, at least, I’m happy

garagecoder · September 26, 2022, 9:29pm

I would say small steps are a big deal when you’re building it from the ground up! I assume you’re learning a fair bit from it too…

Reptorian · September 27, 2022, 3:57am

Seem like a OCR reader can be possible. Imagine loading a full image in G’MIC plugin, then seeing result in text box.

lylejk · September 27, 2022, 4:57am

When I was in college, I figured out how a clever way to create the arrays for a Hebbian Net on my HP 28S calculator when I was auditing a neural networks course (did my senior design project which involved neural networks so figure I better learn a bit about the subject). Was able to, pretty consistently, detect the numbers 1 through 5 as I recall. lol

Your drawings reminded me of the primitive bitmaps that I fed into the Hebbian Net, David. Of course, I never pursued either programming or neural networks in my short (10 year) professional career (was an Analog guy). Still, enjoyed programming my 28S back then. Created all kinds of widgets from perpetual calendars, to prime number generators as I recall. Had a lot of dead time and geek I was then; not so much now. lol

Thanks for the memory recall.

David_Tschumperle · September 27, 2022, 5:37am

I do, yes. That’s actually why I didn’t want to rely on an existing library (despite GPU optimizations I’m missing). Also, particularly true because the lib is written in my own language, using my own math evaluator A real bug hunter!

David_Tschumperle · September 27, 2022, 5:49am

Sorry @Joern_E , I haven’t seen your question.
The answer is: yes. Definitely. If we are able to build a nice database, then the denoising process should be better than what I already did (with synthetic noise).

dmoltis · September 27, 2022, 8:43am

Thanks for working on this and sharing your journey with us! I have a PhD in computer vision so I’ll be happy to lend a theoretical hand if you like

Couple of comments regarding the above:

The results you get on MNIST are pretty good! Perhaps you could try now on CIFAR-10 (or even CIFAR-100). This is still considered a toy dataset, but at least it has colour images so it would get you a step ahead in designing the network
I wonder why you used 4 convolutional layers and 5 fully connected layers. Typically CNN are designed the opposite way, i.e. more convolutional layers (which are invariant to translation and scale shifts) and use only one or two fully connected layers as classifier
A very powerful architecture for denoising is the DAE (Denoising AutoEncoder). The idea is to learn a compressed latent representation that encodes all the semantics needed to reconstruct the corrupted input image.

Implementing an entire CNN framework in your library is on its own a behemoth task, so kudos to that!

David_Tschumperle · September 27, 2022, 9:40am

That’s the plan yes, try on CIFAR-10, then CIFAR-100, then maybe ImageNet, depending on how it goes.

I just tried to add some conv then some fc. I didn’t particularly try to optimize the network size. It was just a test to see if it works. According to my colleagues, the resulting network is a bit overs-sized (300k parameters) for MNIST

I’ll check that. I just found a bug this morning in the parallelization code, I really need to fix that before doing anything else.

It’s true that you learn a lot of things, and that it’s sometimes depressing to not progress as fast as I would like, but I hang in there

David_Tschumperle · September 27, 2022, 6:26pm

Currently trying to train a (slightly bigger) classifier on the CIFAR-10 dataset.
It wIll probably take some time to achieve, but at this point, classification results are not completely random

EDIT: Failed! Got classification performance 100% on training set and 10% on validation set → overfitting. Will try again, with data augmentation.

David_Tschumperle · September 29, 2022, 5:58am

Another test on CIFAR-10, this time with data augmentation (noise, rotation, flips).
I get a 75% classification accuracy, with a 2M-parameters network (à la VGG).
This seems a pretty standard performance when no network regularization is used (which is the case here). Which means, I really have to implement Dropout to get a (hopefully 10%) boost

dmoltis · September 29, 2022, 7:13am

Excellent! Might want to try batch normalisation too instead of dropout, though the network may not be deep enough to see the real benefits of BN

phobrain · October 4, 2022, 6:25am

ML thoughts: I’d like crop suggestions and auto-alignment. I was thinking a YOLO family model might be adapted to suggest crops that I could adjust before accepting/rejecting, and maybe an imagenet model could be adapted to spit out a rotation angle. But pixel-by-pixel color editing of 20+ MP images seems like it might need a GPU to be feasible for bulk use.

As an EOL developer, I’m willing to run lightly over the hairdos of giants, and vectors from VGG16 have worked well for training keras nets in making interesting pairs of photos. I can reduce each photo to 2x 2D vectors and get pair prediction accuracy over 90%.

To-be-https’d results from ~2015:

Woot, I made Autobiographer since I expanded this in filling in my Profile. That calls for a link to show off my first production app, written in Pascal in the early-mid 80’s for the new interactive terminals that replaced IBM cards in my second semester of a community college class.

David_Tschumperle · October 13, 2022, 7:32pm

Some news about `nn_lib`:

I had some free time this week to work on the G’MIC machine learning library (nn_lib).
And I implemented quite a few things:

Dropout layer: This kind of layers is basically used to prevent overfitting when training a neural network. It disables random parts of the network (see a detailed explanation here: Dropout in Neural Networks. Dropout layers have been the go-to… | by Harsh Yadav | Towards Data Science).
I hoped that using this kind of layers would significantly improve my classification results with the CIFAR10 dataset, but it didn’t really. I have yet to figure out why. I am still at a maximum detection rate of 75% for the moment, which is not that great (but not totally rotten either). Maybe it is due to the network architecture I’m using, which is not adapted?. But I’d like to put that aside for the moment.
Normalization layer: This kind of layers is used to linearly normalize the data passing through the network. This mainly avoids gradient vanishing or exploding problems, as well as improve the generalization of the network. See details here; Batch normalization - Wikipedia
I’ve indeed observed more stability by using such layers in the network. Weights of the conv or linear layers stay in a reasonable value range, as well as the moments of the gradients (so, better for the optimizer). Really useful !
New combined layers: Thus, I’ve also implemented various “standard” neural blocks (combinations of “atomic” layers), which combine conv/linear layers (residual or not) with normalization layers. These combinations can be used to design deeper networks. And I confirm this adds value. Using those blocks, I’ve been able to build and train deeper and deeper networks, with higher numbers of parameters. I think my record for now is having trained a 2M+ parameters network (U-Net architecture), which is still a pretty small network compared to today’s standard, but which is way better than what I was able to do before. Just for fun, this allowed me to train a 50-layers classifier without waiting for an infinite time
Patch shuffling layers (patch upscale/downscale) : Used to upscale or downscale image data in a network without losing information (using simple pixel shuffling). Seems to be effective in classifiers “à la VGG”, where image resolution is reduced (replaces maxpool). I’ve also tried them on U-Net like architectures, for data generation. More tests needed to be sure how to use them properly.

As always, implementing these new stuffs required some changes in the nn_lib architecture, so I had to break/rewrite a lot of things (for instance, command denoise_cnn which had to be patched to work with the new version of the library). I’ve also fixed some bugs in the process!

I’ve been able to test the validity of all these new modules on the MNIST classification base example I’ve talked about before. It all seems to work very well (I now get a score of over 99% successful classification on MNIST if I remember correctly, with dropout/normalization layers added).

Today, I decided to explore something a bit different : the CelebA dataset (CelebA Dataset)
which contains a lot of (+200.000) different portraits, each being described with different (40) attributes :

5_o_Clock_Shadow Arched_Eyebrows Attractive Bags_Under_Eyes Bald Bangs Big_Lips Big_Nose Black_Hair Blond_Hair Blurry Brown_Hair Bushy_Eyebrows Chubby Double_Chin Eyeglasses Goatee Gray_Hair Heavy_Makeup High_Cheekbones Male Mouth_Slightly_Open Mustache Narrow_Eyes No_Beard Oval_Face Pale_Skin Pointy_Nose Receding_Hairline Rosy_Cheeks Sideburns Smiling Straight_Hair Wavy_Hair Wearing_Earrings Wearing_Hat Wearing_Lipstick Wearing_Necklace Wearing_Necktie Young

The interest of this dataset is that images are larger (178x218 RGB) that in my previous experiments with MNIST, and that there are quite a lot of images (so, variety).
My first experiment has been to train a neural network to predict the genre of a portrait (male / female). It seems it works as expected, as you can see in the screenshot below (wrong predictions tinted in red). I’ve not yet computed the classification score (network is still training), but what I observe is promising.

Once this network is trained, I’ll probably try to use it so that it modifies an input image iteratively to change the gender of the portrait. Can be fun

That’s it for now. Still a lot of work to do, but I’m slowly progressing in the right direction, hopefully. Stay tuned!

David_Tschumperle · October 14, 2022, 4:54pm

Today’s news:

I thought it would be cool to finally have a face detector in G’MIC, and so I :

Built a database of face/non-face images (20,000 images). Used the ‘this person does not exist’ and ‘lorem picsum’ web services to retrieve images of the two types to discriminate.
Trained a binary classifier on patches of size 54x54, to tell if a patch contains a face.

This works quite well. Now, my mission is to build the smallest possible neural network that achieves an acceptable classification score. Then, integrate it into G’MIC, as a new command such a detect_faces or something like this.

For now, my network already has 300k parameters, I believe this could be reduced, maybe to 100k?

Here is the current classification results obtained on a sample of 16 images:
gmic_face_detection

The percentage you see is the prediction for each image of the statement “this image contains a face”. It’s not even quantized, so sometimes you get predictions like 80% or 15%. But here, it seems the network was sure of what’s going on

Looks quite good to me. I’ll see how much I can lower the number of parameters!

garagecoder · October 14, 2022, 5:18pm

This seems a bit more solid… the previous one could be something like “long hair detection”, hard to know what it’s finding!

afre · October 14, 2022, 5:39pm

@David_Tschumperle I am curious what algorithms/modes are in use. I am sure you referenced them somewhere…

David_Tschumperle · October 14, 2022, 6:05pm

It’s basically Backpropagation - Wikipedia .
You optimize a loss function with a gradient descent (with moments), which means backpropagating gradients through all the network layers.

afre · October 14, 2022, 6:19pm

Good to know. I like back propagation. Thanks!

David_Tschumperle · October 14, 2022, 7:19pm

Update: 8192 parameters, and I still get a very good classification result of my validation test (around 98%).

Machine Learning Library in G'MIC

Some news about nn_lib:

Today’s news:

Some news about `nn_lib`: