I want to share some of my experiences with optimizing a lot of stuff in RT (I can’t speak about DT)
On my 8 core (no Hyper threading) 4 GHz AMD CPU the speed is clearly limited by memory bandwidth (at least for demosaic algorithms). Amaze demosaic takes about 600 ms for a D800 36Mp file on my machine (about 500 ms with the new version I’m currently working on). That’s less than the time to decode the compressed raw file (decoding usually is single threaded and needs about 850 ms for the above mentioned D800 file on my machine). In RT Amaze runs on the whole raw image.
LMMSE is multi threaded in RT and partly vectorized too and its speed is also limited by memory bandwidth when using a lot of cores.
Sharpening speed is also mostly bound by memory bandwidth.
For algorithms like denoise it’s a bit different because the computational amount is higher (exp functions for example) But it’s still limited by memory bandwidth on my machine (though not so much).
A special case are algorithms which transfer the image to work in a different colour space (e.g. from Lab -> ciecam02). Then the speed is often limited by the computational amount (sin, cos and atan2 functions) and memory speed doesn’t play a big role.
Conclusion: If you want to buy a machine with a lot of cores to work with RT, you should also invest in fast memory (better get 16 GB fast memory than 32 GB slower memory)