I think that’s already set. (Not at the computer currently).
I was at 9 GB of RAM usage and during export it jumped to something over 12 GB (16 GB total).
With all the memory transfers involved, would a 96-core GPU be really faster than a four-core hyperthreaded CPU? The more as the CPU can use vector instructions.
(that hyperthreaded 4-core explains the “4.248 secs (30.812 CPU)” you saw, btw: CPU time adds time over all threads).
The thing is, sometimes it is. Here’s an image that took 24.475 secs on GPU and 27.219 secs on CPU. Not a huge difference.
opencl_img_1.txt (7.8 KB)
And here’s another image, where it takes 6.602 secs on GPU and 12.964 secs on CPU.
opencl_img_2.txt (6.3 KB)
I suspect masked modules, Demosaic and Denoise (profiled) are the most demanding modules in general, but idk what makes those relative differences where GPU is N times faster and sometimes not.
The absolute difference is caused by non-local Denoise adding whopping 13 seconds on CPU and 10 seconds on GPU.