Compiling with optimization -O3

I am currently experimenting with compiler flags.
When I compile ART (as a Release) with flag -O2 everything is fine.

When I change the flag to -O3 (is it recommended?) I get a side effect on photos. The flags are as follows:

The side effect is a green-ish photo in editing mode like this

Is there any advantage to use the aggressive optimize flags -O3 ?

I compile with -O3 all the time, it’s not supposed to break anything. What os, CPU and compiler version are you using? Also, does it happen with all the files or only some? If so, can you share one? Thanks!

1 Like

Thanks for the quick reply. I am using an AMD Ryzen 3950X (see attachment.
The green-ish happens to all raw DNG files but only when I use the compiler flag -O3.cpu.txt (47.3 KB)
OS: Fedora release 35 (Rawhide)
Linux kirk 5.12.0-0.rc6.184.fc35.x86_64 #1 SMP Mon Apr 5 18:47:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

which compiler and version?

Good morning. I got an update of gcc this morning but the result is still the same.

gcc -v

Es werden eingebaute Spezifikationen verwendet.
Ziel: x86_64-redhat-linux
Konfiguriert mit: …/configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --man
dir=/usr/share/man --infodir=/usr/share/info --with-bugurl= --enable-shared --enable-threads=
posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enab
le-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enabl
e-initfini-array --with-isl=/builddir/build/BUILD/gcc-11.0.1-20210405/obj-x86_64-redhat-linux/isl-install --enable-offload-targe
ts=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=
Thread-Modell: posix
Unterstützte LTO-Kompressionsalgorithmen: zlib zstd
gcc-Version 11.0.1 20210405 (Red Hat 11.0.1-0) (GCC)

It seems similar to this but that was a problem with gcc 10 not gcc 11…

Edit: @01McAc just in case, could you try setting the flag -fno-tree-loop-vectorize when compiling?

@Thanatomanic: Will do, but could you help me on how to enable this flag? Instead -O3: CXX_FLAGS_RELEASE=-fno-tree-loop-vectorize ??
I use

ccmake …

in the build directory to make things easier.

just set CMAKE_CXX_FLAGS:STRING=-fno-tree-loop-vectorize in your CMakeCache.txt file (and leave -O3 there)

OK, I added the flag, compiled and it works as expected. No green-ish DNG’s anymore. Thank you very much.
ART/about/version reports the following build flags:

-std=c++11 -march=native -Werror=unused-label -fno-math-errno -Wall -Wuninitialized -Wno-deprecated-declarations -Wno-unused-result -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG -fno-tree-loop-vectorize -ftree-vectorize

Thanks. It seems there might still be some issue with recent gcc versions and their aggressive optimizations. I guess this affects RT too. @heckflosse should we generalise the check for “bad” gcc versions and add -fno-tree-loop-vectorize automatically? Do you want to follow-up with the gcc devs? (IIRC you contacted them the other time, do I remember right?)

BTW, to answer also about the performance question: yes, every optimisation helps. Also, if you are compiling only for yourself, make sure to use PROC_TARGET_NUMBER:STRING=2 to enable optimisations specific to your cpu


@01McAc Could you build this small code snippet using gcc 11 and -O3 and post the output here?

#include <iostream>

constexpr double xyz_sRGB[3][3] = {
    {0.4360747,  0.3850649, 0.1430804},
    {0.2225045,  0.7168786,  0.0606169},
    {0.0139322,  0.0971045,  0.7141733}

int main() {
    double rgb_cam[3][3] = {{1.0, 2.0, 3.0}, {4.0, 5.0, 6.0}, {7.0, 8.0, 9.0}};
    double xyz_cam[3][3] = {{0.0, 0.0, 0.0}, {0.0, 0.0, 0.0}, {0.0, 0.0, 0.0}};

    for (int i = 0; i < 3; i++)
        for (int j = 0; j < 3; j++)
            for (int k = 0; k < 3; k++) {
                xyz_cam[i][j] += xyz_sRGB[i][k] * rgb_cam[k][j];

        for (int i = 0; i < 3; i++)
            for (int j = 0; j < 3; j++) {
                std::cout << xyz_cam[i][j] << std::endl;
return 0;

You remember right. Here’s the link to the old ticket for gcc 10.1

Edit: Re-read your post again. I guess there is a header file (?)missing:

gcc test.c
test.c:1:10: schwerwiegender Fehler: iostream: Datei oder Verzeichnis nicht gefunden
    1 | #include <iostream>
      |          ^~~~~~~~~~
Kompilierung beendet.

@01McAc I will answer later in the evening


g++ -o test -O3


Thanks for testing.
That looks correct, so it’s not the same bug as in gcc 10.1, damn…