PhotoFlow optimizations and benchmarks

I am starting to learn SIMD and vectorization… could you explain what would be the optimal memory layout of the matrix and RGB values for this matrix(3,3) x vector(3) product?

Thanks!