some timings:
[perf] demosaic_down: 0.113 ms
[perf] demosaic_gauss: 0.129 ms
[perf] demosaic_splat: 0.954 ms
[perf] demosaic_fix: 1.585 ms
this last step is like the median filtering colour smoothing iterations often done after demosaic, so it’s optional.
Image Width : 6034
Image Height : 4028
the algorithm is something similar to the super resolution gaussian splatting in the google paper, but i’m not very good at implementing stuff 1:1. i like it because it has potential to extend it to image stacking/super resolution, and it also works for xtrans with very minor changes (i run the same code). but you know i’m no expert for demosaicing and my eyes are too bad to tell the pixel level differences.
input size = output size, not sure what you mean there? just output has 3 channels per pixel.
i’m not transferring any memory between CPU and GPU, that would be stupid (this is the whole point of doing a full GPU pipeline). the raw is uploaded once and then the pixels never leave the device (other than for the monitor or once you export to file).
fwiw the hard drive loading, decoding and transfer i believe are summarised here:
[rawspeed] load /home/jo/Pictures/RA_William_Clark-2.dng in 375ms
[perf] i-raw_main: 3.718 ms
to rawspeed’s defense, the RAF of my fuji load in about 35ms (or 9ms once the disk caches are in ram).
i’m no GPU expert and in particular the demosaicing i didn’t spend much time on it. i’m sure clever or capable people could get more out of it (both in terms of quality and speed).