@David_Tschumperle Thanks for adding boundary argument. As for speed, it seems that one would benefit from a dynamic switch to convolve. I haven’t done a rigorous test like that, but I did note at on 10x10 and higher, fft becomes faster than regular convolve in my 6-cores machine.
For the earlier question, I now realized that I can code in dynamic size depending on angle. This will reduce computation time, and in theory should be much faster than pdn version.
EDIT: Mistaken threads for core.