PhotoFlow optimizations and benchmarks

While starting to understand how a dot product should be implemented with SIMD instructions, I stumbled on this stack overflow answer, which seems to be pertinent: https://stackoverflow.com/a/17019970

Also, there is a SIMD library developed in the context of a CERN project, and which looks interesting: GitHub - edanor/umevector: Vectorization EDSL library

Knowing how CERN works, and how much effort they put in high-quality computing, I expect it to be well-written…

What do you think?