No you really want to access the RAW data and apply it before anything else to get the best of this method. It’s more low-level signal processing than a filter.
I’m currently working on a Cython (hybrid Python/C) implementation, and I already reduced the computing time by a quarter by just optimizing the total variation regularization term calculation, so I’m confident that with a tiled/multithreaded C FFT implementation, we could divide the global execution time by at least 2.
The 12 Mpx image took me an hour on 1 process because running it on 3 triggered a memory error. Playing with lower-level layers of code allows me to reduce both the multithreading over-head and and the RAM footprint.