Fyi: I compiled G’MIC using make cli
. Didn’t look which settings are default.
Yeah, sure. I also started programming at the beginning of the 90s. Those modern tools and syntax changes make the code more stringent and add new power (especially for an old language like C++), and I expect them to yield the same optimal output.
That wasn’t meant as a side blow. I admire your work, and backwards compatibility (language wise) sure is an obstacle.
I’m not seeing a problem with the coding style, either. I was more focused on the programming style (T*
as argument) where you have to tell the user how many Ts you expect rather then telling the compiler.
OK, so to sum up :
The G’MIC Makefile has been using optimization flags -O3 -mtune=generic
, which leads to a faster execution of functions std::min/max()
against the cimg::min/max()
(which are basic template implementations of the min/max functions). Now that I use -Ofast
for the optimization flags, and with a simple template specialization of the cimg::min/max()
functions, the processing time is now comparable than the use of std::min/max()
.
I’ll add the code also for 5x5 median filter too.
Thanks for your patch @heckflosse.
Actually I was just making fun with the pun. I didn’t refer to the 25 parameters variant on purpose.
In case you are interested. median.h includes code for 7x7 and 9x9 too.
Sorry for being over-reactive, but I’ve already faced a lot of situations where I’ve seen people (usually students from engineering schools ) giving a lot of advice and recommendations about how things must be correctly done. Most of the time, it appears they know nothing about programming. With time going on, I’ve learned to be wary of allusions to the proper way to program.
Isn’t that a bit risky? It enables -ffast-math
which enables -ffinite-math-only
which can be problematic when handling NaNs or Infs …
Ah yes, maybe. I didn’t think about that issue.
I’ll revert back to -O3, and force the use of std::min/max()
in case C++11 is enabled.
For some reasons, the basic min/max functions return a>=b?a:b
is less optimized than std::max()
when using -O3 alone (and become the same with -Ofast
).
If that matters: std::min/max()
is C++98.
Okay, I did a small test:
#include <iostream>
#include <algorithm>
int main(int argc, char** argv)
{
float a;
float b;
std::cin >> a;
std::cin >> b;
const float c = std::max(a, b); // option 1
const float c = a >= b ? a : b; // option 2
const float c = a < b ? b : a; // option 3
std::cout << c << std::endl;
return 0;
}
Option 1:
0000000000400790 <main>:
400790: 48 83 ec 18 sub $0x18,%rsp
400794: bf 80 0d 60 00 mov $0x600d80,%edi
400799: 48 8d 74 24 08 lea 0x8(%rsp),%rsi
40079e: e8 dd ff ff ff callq 400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
4007a3: 48 8d 74 24 0c lea 0xc(%rsp),%rsi
4007a8: bf 80 0d 60 00 mov $0x600d80,%edi
4007ad: e8 ce ff ff ff callq 400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
4007b2: f3 0f 10 44 24 0c movss 0xc(%rsp),%xmm0
4007b8: bf c0 0e 60 00 mov $0x600ec0,%edi
4007bd: f3 0f 5f 44 24 08 maxss 0x8(%rsp),%xmm0
4007c3: f3 0f 5a c0 cvtss2sd %xmm0,%xmm0
4007c7: e8 94 ff ff ff callq 400760 <std::ostream& std::ostream::_M_insert<double>(double)@plt>
4007cc: 48 89 c7 mov %rax,%rdi
4007cf: e8 9c ff ff ff callq 400770 <std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)@plt>
4007d4: 31 c0 xor %eax,%eax
4007d6: 48 83 c4 18 add $0x18,%rsp
4007da: c3 retq
4007db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
Option 2:
0000000000400790 <main>:
400790: 48 83 ec 18 sub $0x18,%rsp
400794: bf 80 0d 60 00 mov $0x600d80,%edi
400799: 48 8d 74 24 08 lea 0x8(%rsp),%rsi
40079e: e8 dd ff ff ff callq 400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
4007a3: 48 8d 74 24 0c lea 0xc(%rsp),%rsi
4007a8: bf 80 0d 60 00 mov $0x600d80,%edi
4007ad: e8 ce ff ff ff callq 400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
4007b2: f3 0f 10 54 24 0c movss 0xc(%rsp),%xmm2
4007b8: bf c0 0e 60 00 mov $0x600ec0,%edi
4007bd: 0f 28 ca movaps %xmm2,%xmm1
4007c0: f3 0f 10 44 24 08 movss 0x8(%rsp),%xmm0
4007c6: f3 0f c2 c8 02 cmpless %xmm0,%xmm1
4007cb: 0f 54 c1 andps %xmm1,%xmm0
4007ce: 0f 55 ca andnps %xmm2,%xmm1
4007d1: 0f 56 c1 orps %xmm1,%xmm0
4007d4: f3 0f 5a c0 cvtss2sd %xmm0,%xmm0
4007d8: e8 83 ff ff ff callq 400760 <std::ostream& std::ostream::_M_insert<double>(double)@plt>
4007dd: 48 89 c7 mov %rax,%rdi
4007e0: e8 8b ff ff ff callq 400770 <std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)@plt>
4007e5: 31 c0 xor %eax,%eax
4007e7: 48 83 c4 18 add $0x18,%rsp
4007eb: c3 retq
4007ec: 0f 1f 40 00 nopl 0x0(%rax)
Option 3:
0000000000400790 <main>:
400790: 48 83 ec 18 sub $0x18,%rsp
400794: bf 80 0d 60 00 mov $0x600d80,%edi
400799: 48 8d 74 24 08 lea 0x8(%rsp),%rsi
40079e: e8 dd ff ff ff callq 400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
4007a3: 48 8d 74 24 0c lea 0xc(%rsp),%rsi
4007a8: bf 80 0d 60 00 mov $0x600d80,%edi
4007ad: e8 ce ff ff ff callq 400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
4007b2: f3 0f 10 44 24 0c movss 0xc(%rsp),%xmm0
4007b8: bf c0 0e 60 00 mov $0x600ec0,%edi
4007bd: f3 0f 5f 44 24 08 maxss 0x8(%rsp),%xmm0
4007c3: f3 0f 5a c0 cvtss2sd %xmm0,%xmm0
4007c7: e8 94 ff ff ff callq 400760 <std::ostream& std::ostream::_M_insert<double>(double)@plt>
4007cc: 48 89 c7 mov %rax,%rdi
4007cf: e8 9c ff ff ff callq 400770 <std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)@plt>
4007d4: 31 c0 xor %eax,%eax
4007d6: 48 83 c4 18 add $0x18,%rsp
4007da: c3 retq
4007db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
So, what we can see here is that option 1 and 3 are equal while 2 is inferior. One possible explantation is that most if not all STL algorithms require operator<()
for comparison and the GCC folks optimized for this (std::max()
is implemented that way, too). Or maybe there is something about IEEE 754 and SSE2 that distinguishes <
from >=
for some reason.
HTH
Flössie
I’ve just replaced all my previous use of cimg::min/max()
by std:min/max()
when it is possible in the code of G’MIC/CImg. I don’t expect a huge peed gain though, but it’s still nice to know.
Me neither.
I meant, except for the median filtering where min/max
are heavily used
OK, so I’ve just released some “pre-release” binaries for version 1.7.6 of G’MIC, with some new optimizations Ingo and I have implemented. Feel free to try and tell me your feelings about it. They are available here : Index of /files/prerelease
guys, you are awesome
OK I can hardly believe this is correct, win7x64 gimp plugin with -median 7 (on an old core2 2.4ghz):
1.7.5: ~23s
1.7.6 pre: ~5s
With an optimised version of dehaze it only chopped off around 1s to become ~16s, but that’s expected.
Very impressed with the “regularization” of retinex too!