(… Not finished, to be continued …)
Hello Everyone,
After slightly more than one year working on the nn_lib
(a.k.a. Neural Network Library), I’ve decided it was time to talk about its API a little more precisely.
This post is intended to introduce the basic features and the API currently implemented in nn_lib
, and how it can be used in your own G’MIC scripts. It is subject to change as the library may evolve in future release (but not so much hopefully )
This introduction is going to be incomplete, because it’s a library with lots of little features that I couldn’t possibly detail. So feel free to ask questions below this post if you want more information on certain points.
What I assume here is that you are already familiar with neural networks and how they are trained (backpropagation algorithm, usual network layers, etc.).
1. Quick Introduction
1.1. What is nn_lib
?
nn_lib
is a subset of commands of the G’MIC standard library, allowing to build, train and evaluate neural networks. It has the following properties:
- It is implemented purely with the G’MIC script langage. It doesn’t require any additional dependency to an external machine learning library. The code is of course open and free (part of the
gmic_stdlib
). It just uses long-time existing features of G’MIC, so by itself, It has a very low memory footprint (current implementation uses 2400 LOC). - It does not use GPUs. All computations are done with the CPU(s) (parallelized when multiple cores are available). Then of course, It is not as performant as other existing machine learning libraries that are able to use GPUs.
- Therefore, this library is not designed to train very large networks containing billions of parameters (which is the case for Dall-E, GPT3, etc.). But surprisingly, the library is still “fast” enough to train networks with a few tens or even hundreds of millions of parameters, which means that you can still do interesting things with it.
A command of the nn_lib
always starts with the prefix nn_
.
All existing nn_lib
commands are visible here, on the reference webpage.
1.2. How a neural network is stored in G’MIC?
A neural network basically is composed of two parts:
- The network architecture, i.e. the kind of layers that compose the network and how these layers are organized together. In
nn_lib
, we define the network architecture by writing a sequence of commands that define it (and that also store this architecture information in global variables named as$_nn_something
). - The network weights \theta, are the floating-point parameters that will be learned. At the end, it is important to remind that a neural network is only a big parameterized function Y = f_\theta(X) that takes an image/vector input X and that outputs another image/vector Y.
f_\theta is just an explicit parameterized function. Innn_lib
, the network weights are stored as regular images in the image list. To be more precise, each network layer that has trainable parameters has its own weight image assigned in the list.
2. Tutorial: Learn a Complex Mapping (x,y) \rightarrow (R,G,B).
2.1. Define the network architecture
To illustrate how nn_lib
, I propose to write a simple script that defines and train a neural network able to learn a mapping between 2D coordinates (x,y) and a color (R,G,B). Basically what we want to build is a neural-net based function that returns the (R,G,B) color of a point of a given image, when we specify the coordinates (x,y).
So, the network input is a 2D vector, and the output is a 3D vector. The architecture used here is very simple: it’s a sequence of so-called fully-connected hidden layers between the input and the ouput. This is how such a network can be defined in G’MIC:
nn_test_color:
# Input (square) color image that will be used as the reference image for the training
sp monalisa,256 r. 256,256,1,3,0,0,0.5,0.5
# Define neural network architecture.
nn_init # Init new network
nn_input IN,2 # Input of the netwok : a 2D vector (x,y), named 'IN'
nn_fcnnl L1,IN,64 # Add first hidden layer 'L1': fully-connected (64 neurons) with data normalization
N=6 # Number of additional hidden layers
repeat $N { # Add residual fully-connected layers
nn_resfcnl L{$>+2},L{$>+1}
}
nn_fc OUT,L{$N+1},3 # Last layer: Output of the network, i.e. a 3D vector (R,G,B)
At this point, this is the content or the image list when you run $ gmic nn_test_color
:
You can see the color image, as well as the other images that actually define the different weights of each neural network layers.
2.2. Prepare network training.
To be able to train the network, two things are still missing:
- A loss function: This is the function that will be used to determine the error made by the network. We feed the network with a set of coordinates (x_k,y_k), it returns a set of (R_k,G_k,B_k) colors, and the loss computes the error made by the network between the returned (R_k,G_k,B_k) and the colors (R'_k,G'_k,B'_k) we’d actually expect (taken from the reference image). Here, we can choose for instance L = \frac{1}{N} \sum_{k=1}^N (R_k - R_k')^2 + (G_k - G'_k)^2 + (B_k - B'_k)^2, which is known as the MSE loss (Mean-Squared Error).
The loss is the function we want to minimize (ultimately should be 0, but this is never the case in practice). Withnn_lib
, you define a loss, namedLOSS
, like this:
nn_loss_mse LOSS,OUT,GT
Here, GT
is the Ground-Truth, i.e. the color (R'_k,G'_k,B'_k) you’d want as the network output.
- A trainer. This defines how the network will be trained, what loss it tried to minimize, what value is chosen for the learning rate, and what algorithm it uses to do this (there are numerous variants of gradient descents defined in the literature…). For our tutorial, let us just write:
nn_trainer T,LOSS,1e-3,adam
Adam
is a quite popular variant of the gradient descent that uses moments to minimize the loss.
Let us check everything has been understood by nn_lib
, with nn_print
:
nn_print
outputs this:
* List of modules:
- Module: IN (type: input)
* Output: 2,1,1,1
- Module: L1_fc (type: fc)
* Output: 1,1,1,64
* Parameters: 192
- Module: L1_normalize (type: normalize)
* Output: 1,1,1,64
* Parameters: 4
- Module: L1 (type: nl)
* Output: 1,1,1,64
- Module: L2_clone0 (type: clone)
* Output: 1,1,1,64
- Module: L2_clone1 (type: clone)
* Output: 1,1,1,64
- Module: L2_fc (type: fc)
* Output: 1,1,1,64
* Parameters: 4160
- Module: L2_add (type: add)
* Output: 1,1,1,64
- Module: L2 (type: nl)
* Output: 1,1,1,64
- Module: L3_clone0 (type: clone)
* Output: 1,1,1,64
- Module: L3_clone1 (type: clone)
* Output: 1,1,1,64
- Module: L3_fc (type: fc)
* Output: 1,1,1,64
* Parameters: 4160
- Module: L3_add (type: add)
* Output: 1,1,1,64
- Module: L3 (type: nl)
* Output: 1,1,1,64
- Module: L4_clone0 (type: clone)
* Output: 1,1,1,64
- Module: L4_clone1 (type: clone)
* Output: 1,1,1,64
- Module: L4_fc (type: fc)
* Output: 1,1,1,64
* Parameters: 4160
- Module: L4_add (type: add)
* Output: 1,1,1,64
- Module: L4 (type: nl)
* Output: 1,1,1,64
- Module: L5_clone0 (type: clone)
* Output: 1,1,1,64
- Module: L5_clone1 (type: clone)
* Output: 1,1,1,64
- Module: L5_fc (type: fc)
* Output: 1,1,1,64
* Parameters: 4160
- Module: L5_add (type: add)
* Output: 1,1,1,64
- Module: L5 (type: nl)
* Output: 1,1,1,64
- Module: L6_clone0 (type: clone)
* Output: 1,1,1,64
- Module: L6_clone1 (type: clone)
* Output: 1,1,1,64
- Module: L6_fc (type: fc)
* Output: 1,1,1,64
* Parameters: 4160
- Module: L6_add (type: add)
* Output: 1,1,1,64
- Module: L6 (type: nl)
* Output: 1,1,1,64
- Module: L7_clone0 (type: clone)
* Output: 1,1,1,64
- Module: L7_clone1 (type: clone)
* Output: 1,1,1,64
- Module: L7_fc (type: fc)
* Output: 1,1,1,64
* Parameters: 4160
- Module: L7_add (type: add)
* Output: 1,1,1,64
- Module: L7 (type: nl)
* Output: 1,1,1,64
- Module: OUT_fc (type: fc)
* Output: 1,1,1,3
* Parameters: 195
- Module: OUT (type: rename)
* Output: 1,1,1,3
- Module: LOSS (type: loss_mse)
* Output: 1,1,1,1
- Module: T (type: trainer)
* Output: 0,0,0,0
* Parameters: 15
* Total: 39 modules, 25347 parameters.
nn_print
prints a detail of each network module, as well as the total number of parameters. Here around 25k parameters, so it’s a quite small network
2.3. Train network
A typical way of training a neural network consists in repeating the following steps again and again :
- Build a so-called mini-batch. This is a set of M samples (usually M is quite small, like 32, 64 or 128), taken from the learning dataset (in our case, M points of our color image, with their associated colors).
- Then, we feed the network with each input vector (x_k,y_k) from the mini-batch, and we get the corresponding estimated (R_k,G_k,B_k) colors. This phase is called the forward pass.
- The error with the real image colors (R'_k,G'_k,B'_k) (ground-truth) is computed for each sample k of the mini-batch and those errors are added together. It’s how the loss function L is computed.
- The errors are back-propagated in the network, allowing to compute how the layer weights must vary to minimize the error. This phase is called the backward pass.
- Finally, the network weights are updated, according to the variations computed at the previous step.
The main purpose of the nn_lib
is to make steps 2,3,4 and 5 super easy to write. Let me illustrate how it can be done for our simple use case:
1. Mini-batch creation :
We store our mini-batch of M samples, as a 1xMx1x5
image, where each row correspond to one sample, with vector-valued point containing the [ x,y,R,G,B ]
of each sample :
# Build mini-batch (choose random pair of points/colors from the color image).
# 2D coordinates are assumed to be normalized in range [-1,-1].
# RGB color is normalized in range [0,1].
1,32,1,5,"P = round(u([w#0,h#0] - 1)); [ lerp(-1,1,P[0]/w#0),lerp(-1,1,P[1]/h#0),I(#0,P)/255 ]"
Note that we normalize the ranges of both (x,y) coordinates (in range [-1,-1]
) and (R,G,B) colors (in range [0,1]
). That’s because small ranges of values (typically between [-1,1]
) makes the network converge faster, without having to introduce explicit normalization layers.
2. Forward pass / Loss computation / Backward pass / Weight update:
When we defined the network architecture at the beginning with the nn_lib
commands, there is some “magic” variables that has been set automatically, named $_nn_forward
, $_nn_loss
, $_nn_backward
and $_nn_update
that contains all the necessary code that you have to put in a math expression to compute the forward pass.
So what you need to write at the end is just:
# Apply one iteration of learning.
eval. *${-nn_lib}" # Include the 'nn_lib' commands
IN = [ i0,i1 ]; # Put 2D coordinates in 'IN'
GT = [ i2,i3,i4 ];"\ # Put RGB expected result in 'GT' (ground-truth)
${_nn_forward}\
${_nn_loss}\
${_nn_backward}\
${_nn_update}\
"end(set('loss',LOSS)); I"
Here, notice that all samples of the mini-batch are processed in parallel. This is how the parallelization occurs with nn_lib
. Usually you have at least 16 samples in your mini-batch, so it’s always interesting to parallelize the math expression this way.
3. Final code:
All these parts together gives + a few stuffs (to be documented… )
#------------------------------------------------
# Network that learn a mapping (x,y) -> (R,G,B).
#------------------------------------------------
nn_test_color :
# Input (square) color image that will be used as the reference image for the training.
sp monalisa,256 r. 256,256,1,3,0,0,0.5,0.5
# Define neural network architecture.
nn_init # Init new network
nn_input IN,2 # Input of the netwok : a 2D vector (x,y), named 'IN'
nn_fcnnl L1,IN,64 # Add first hidden layer 'L1': fully-connected (64 neurons) with data normalization
N=6 # Number of additional hidden layers
repeat $N { # Add residual fully-connected layers
nn_resfcnl L{$>+2},L{$>+1}
}
nn_fc OUT,L{$N+1},3 # Last layer: Output of the network, i.e. a 3D vector (R,G,B)
nn_loss_mse LOSS,OUT,GT # Define the MSE loss
nn_trainer T,LOSS,1e-3,adam # Define the network trainer
nn_print
# Start training loop.
best_loss=inf
view_iter=-inf
repeat inf {
iter=$>
# Build mini-batch (choose random pair of points/colors from the color image).
# 2D coordinates are assumed to be normalized in range [-1,-1].
# RGB color is normalized in range [0,1].
1,32,1,5,"P = round(u([w#0,h#0] - 1)); [ lerp(-1,1,P[0]/w#0),lerp(-1,1,P[1]/h#0),I(#0,P)/255 ]"
# Apply one iteration of learning.
eval. *${-nn_lib}"
IN = [ i0,i1 ]; # Put 2D coordinates in 'IN'
GT = [ i2,i3,i4 ];"\ # Put RGB expected result in 'GT' (ground-truth)
${_nn_forward}\
${_nn_loss}\
${_nn_backward}\
${_nn_update}\
"end(set('loss',LOSS)); I"
rm. # Remove mini-batch
echo[] LOSS=$loss
# Display full reconstruction from time to time (when loss somehow improves).
if $loss<$best_loss*1.25" && "$iter>$view_iter+100
+f[0] *${-nn_lib}" # Compute (RGB) color estimated the network from each (x,y) of the color image
IN = [ lerp(-1,1,x/w),lerp(-1,1,y/h) ]; # Normalized coordinates
"$_nn_forward" # Compute network result
cut(OUT*255,0,255)"
w. 600,600,0 rm.
view_iter=$iter
fi
best_loss:=min($best_loss,$loss)
}
I let you study this code in detail if you are interested. Please ask questions if something is not clear for you.
As a result, you get something like this:
I’ll add some details to this post when I’ll have more energy
But I wanted to share this first code as soon as possible, so you can test it and play with it eventually.