I have a bit of expertise in deep learning and the frameworks, so I can share it here. I confess that I'm not very familiar with G'mic, but I am a developer of Kdenlive, and such features are also of interest to us (though if they don't make it to G'mic we can't consider other ways to bring them in)
So basically, as mentioned there is usually a training part before being able to use a network (though not always necessary, for example there are very impressive results of artistic style transfer that don't need it). What you have to realize here is that this training is done once and for all. You only need to compute the parameters of the network and share them with your users and they can use it for their task. Note also that these parameters can be computed in any framework and used in any other, so no need to implement training in G'mic.
To use the model, you indeed need to be able to make a forward pass in the network, that is compute the output. This is an order of magnitude easier than training a net, so if the net is not too complicated, it can be manageable to do it with whatever you already have implemented in terms of matrix multiplication.
Otherwise, reinventing the wheel is probably a bad idea. The deep learning frameworks do often use a python interface, but they always have a backend in C++ that can be imported and tailored to fit your needs. There are also libs especially made for fast inference with tight computational resources (mobile phones), like Caffe2.
Using GPUs can help a great deal on that kind of operations. May I ask if you already use them in your pipeline to speed up computation?