Andrea, as a general hint: Don’t use the _mm_mul_ps, _mm_add_ps and so on intrinsics. Use a * b or a + b instead even for SIMD vectors (as I did in my example above). Compilers know how to translate that. It improves readability and often the code is faster than using the intrinsic code.