Btw: my memory tells me that this could be done in roughly 50 ms (or less) using a 5 year-old 8-core AMD FX8350 for a 36 MPixel file if the data is ordered to fit SIMD operations.
Btw: my memory tells me that this could be done in roughly 50 ms (or less) using a 5 year-old 8-core AMD FX8350 for a 36 MPixel file if the data is ordered to fit SIMD operations.