Of course: If you have a loop which operates on more than 4 separate channels (each channel being an array in memory) like conversion from RGB to LAB and each channel has it’s own array (not interlaced) e.g something like this:
float *r = malloc(100000);
float *g = malloc(100000);
float *b = malloc(100000);
<fill r, g and b with some data>
float *Ll = malloc(100000);
float *La = malloc(100000);
float *Lb = malloc(100000);
for (int i = 0; i < 100000; ++i) {
LL[i] = some calculation of r[i], g[i] and b[i] will be fine
La[i] = some calculation of r[i], g[i] and b[i] will be fine but throw one of r[i], g[i], b[i], LL[i] out of L1 cache when L1 cache is only 4 way associatve (only on windows, because windows allocates the large arrays to same offset base 4096)
and so on...
}