i just merged a branch that implements support for evaluating gmic’s gaussian/poissonian resnet as gpu shaders. i’ve been working together with @David_Tschumperle on this. really he’s done all the work and i just ported it over to the gpu, which wouldn’t have been very easy without his support in debugging my broken implementation.
a few initial observations: my implementation is stupid:
- it runs out of memory very quickly, does not attempt any tiling
- it fetches everything from global memory and is thus very slow
- it’s split into way too many small kernels
and currently it evaluates a 1080p image in ~300ms on a low end 1650GTX laptop version.
this is really just a starting point now, i need to do a faster implementation and then likely tweak the network architecture for real-time performance and temporal stability (video).