fwiw this is currently the best i can do in vkdt
:
i think quality can clearly be improved, but this one runs the denoising in 19.odd milliseconds on an entry-level gtx1650 max-q (laptop), full resolution image.
i know i felt really strong about darktable
's region of interest/scaled/cropped rendering pipeline even at the libre graphics meeting (when discussing librtprocess). but at these speeds i just always process the full buffer now, and display whatever lands on screen.
this does not resolve all the abovementioned issues wrt scaled display (the image still needs to be resampled which may show or hide noise), but at least somewhere in the background, there actually is the same image as would be generated during export. this makes coding modules that require a large filter footprint a lot easier, too.