Is my computer really this slow?

Jerome_Boulanger · September 12, 2016, 11:58am

Hi again,

Just wanted to add that the implementation is loosely based on the article since I didn’t use the soft matting step as i replaced it with a simple median filter.
S. Lee, S. Yun, J.-H. Nam, C. S. Won, and S.-W. Jung, “A review on dark channel prior based image dehazing algorithms,” EURASIP Journal on Image and Video Processing, vol. 2016, no. 1, Dec. 2016.

Other more recent approaches are also available and use other assumption than DCP.

Jerome

heckflosse · September 12, 2016, 12:03pm

Ok, here’s a first quick and dirty patch. I just copied over some code from RT for the median of 9 values and used it in blur_median.
I benchmarked it using gmic image.jpg -tic -median 3 -toc -q where image.jpg is a 36 MP file.
Processing time on a 4 core machine (median of 7 runs)

before patch: 894 ms
after patch: 237 ms

Edit: Here is a patch which also includes median of 25 values.

I benchmarked it using gmic image.jpg -tic -median 5 -toc -q where image.jpg is a 36 MP file.
Processing time on a 4 core machine (median of 7 runs)

before patch: 7479 ms
after patch: 1330 ms

floessie · September 12, 2016, 12:46pm

Just a note: Though the code is C++11, it can be easily rewritten in C++98 with a slightly different interface. There is no C++11 magic about it.

David_Tschumperle · September 12, 2016, 1:03pm

Ok, I’m currently looking at your patch, which makes things faster indeed.
The surprise is: the cause of the speed gain is not the algorithm itself, but mainly the use of std::min() and std::max(), instead of my ‘own’ min() and max() functions. Looks like the compiler uses hard-coded functions for computing the min() and max() of two float values. If I use my own min/max functions in your fastmedian() function, I get very similar results as my previous code. So, I’m currently patching my min()/max() functions to make them use std::min/max() when possible. Not sure how I can enable this for C++98 users by the way.
I’ll let you know when this is ready.

heckflosse · September 12, 2016, 1:19pm

I guess the compiler can’t vectorize your ‘own’ min and max functions but it can vectorize std:min() and std:max() at least for float values.

floessie · September 12, 2016, 1:22pm

Instead of passing an std::array<> pass the parameters by value or have templated functions like fastmedian9(T*) where the argument is a pointer to nine T’s (reminds me of the 90s - the programming style as well ).

David_Tschumperle · September 12, 2016, 1:35pm

That is what I’ve done.
Anyway, it seems the optimization flags are not optimal. I compile G’MIC with -O3 -mtune=generic, and in this case, my min/max() functions are slower. If I use -Ofast, they become equivalent to std::min/max() (I’ve looked at the assembly code generated, to compare the two versions).

David_Tschumperle · September 12, 2016, 1:46pm

I don’t see any problem with that coding style. Most of the best coders have started coding in the 90’s
There are so much people advocating for fancy and “modern” syntax who do not realize the assembly code generated by the compiler is the same at the end (or sometimes even worse).
No need to be pedant with good old programmers.

garagecoder · September 12, 2016, 1:53pm

I read it as a nostalgic comment rather than a criticism.
Anyway I’m excited about any speedup, regardless how it’s done

heckflosse · September 12, 2016, 1:55pm

Fyi: I compiled G’MIC using make cli. Didn’t look which settings are default.

floessie · September 12, 2016, 1:57pm

Yeah, sure. I also started programming at the beginning of the 90s. Those modern tools and syntax changes make the code more stringent and add new power (especially for an old language like C++), and I expect them to yield the same optimal output.

That wasn’t meant as a side blow. I admire your work, and backwards compatibility (language wise) sure is an obstacle.

I’m not seeing a problem with the coding style, either. I was more focused on the programming style (T* as argument) where you have to tell the user how many Ts you expect rather then telling the compiler.

David_Tschumperle · September 12, 2016, 1:59pm

OK, so to sum up :
The G’MIC Makefile has been using optimization flags -O3 -mtune=generic, which leads to a faster execution of functions std::min/max() against the cimg::min/max() (which are basic template implementations of the min/max functions). Now that I use -Ofast for the optimization flags, and with a simple template specialization of the cimg::min/max() functions, the processing time is now comparable than the use of std::min/max().
I’ll add the code also for 5x5 median filter too.
Thanks for your patch @heckflosse.

floessie · September 12, 2016, 1:59pm

Actually I was just making fun with the pun. I didn’t refer to the 25 parameters variant on purpose.

heckflosse · September 12, 2016, 2:14pm

In case you are interested. median.h includes code for 7x7 and 9x9 too.

David_Tschumperle · September 12, 2016, 2:40pm

Sorry for being over-reactive, but I’ve already faced a lot of situations where I’ve seen people (usually students from engineering schools ) giving a lot of advice and recommendations about how things must be correctly done. Most of the time, it appears they know nothing about programming. With time going on, I’ve learned to be wary of allusions to the proper way to program.

heckflosse · September 12, 2016, 3:30pm

Isn’t that a bit risky? It enables -ffast-math which enables -ffinite-math-only which can be problematic when handling NaNs or Infs …

David_Tschumperle · September 12, 2016, 4:12pm

Ah yes, maybe. I didn’t think about that issue.
I’ll revert back to -O3, and force the use of std::min/max() in case C++11 is enabled.
For some reasons, the basic min/max functions return a>=b?a:b is less optimized than std::max() when using -O3 alone (and become the same with -Ofast).

floessie · September 12, 2016, 5:51pm

If that matters: std::min/max() is C++98.

floessie · September 13, 2016, 7:48am

Okay, I did a small test:

#include <iostream>
#include <algorithm>

int main(int argc, char** argv)
{
    float a;
    float b;

    std::cin >> a;
    std::cin >> b;

    const float c = std::max(a, b); // option 1
    const float c = a >= b ? a : b; // option 2
    const float c = a < b ? b : a; // option 3

    std::cout << c << std::endl;

    return 0;
}

Option 1:

0000000000400790 <main>:
  400790:       48 83 ec 18             sub    $0x18,%rsp
  400794:       bf 80 0d 60 00          mov    $0x600d80,%edi
  400799:       48 8d 74 24 08          lea    0x8(%rsp),%rsi
  40079e:       e8 dd ff ff ff          callq  400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
  4007a3:       48 8d 74 24 0c          lea    0xc(%rsp),%rsi
  4007a8:       bf 80 0d 60 00          mov    $0x600d80,%edi
  4007ad:       e8 ce ff ff ff          callq  400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
  4007b2:       f3 0f 10 44 24 0c       movss  0xc(%rsp),%xmm0
  4007b8:       bf c0 0e 60 00          mov    $0x600ec0,%edi
  4007bd:       f3 0f 5f 44 24 08       maxss  0x8(%rsp),%xmm0
  4007c3:       f3 0f 5a c0             cvtss2sd %xmm0,%xmm0
  4007c7:       e8 94 ff ff ff          callq  400760 <std::ostream& std::ostream::_M_insert<double>(double)@plt>
  4007cc:       48 89 c7                mov    %rax,%rdi
  4007cf:       e8 9c ff ff ff          callq  400770 <std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)@plt>
  4007d4:       31 c0                   xor    %eax,%eax
  4007d6:       48 83 c4 18             add    $0x18,%rsp
  4007da:       c3                      retq   
  4007db:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Option 2:

0000000000400790 <main>:
  400790:       48 83 ec 18             sub    $0x18,%rsp
  400794:       bf 80 0d 60 00          mov    $0x600d80,%edi
  400799:       48 8d 74 24 08          lea    0x8(%rsp),%rsi
  40079e:       e8 dd ff ff ff          callq  400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
  4007a3:       48 8d 74 24 0c          lea    0xc(%rsp),%rsi
  4007a8:       bf 80 0d 60 00          mov    $0x600d80,%edi
  4007ad:       e8 ce ff ff ff          callq  400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
  4007b2:       f3 0f 10 54 24 0c       movss  0xc(%rsp),%xmm2
  4007b8:       bf c0 0e 60 00          mov    $0x600ec0,%edi
  4007bd:       0f 28 ca                movaps %xmm2,%xmm1
  4007c0:       f3 0f 10 44 24 08       movss  0x8(%rsp),%xmm0
  4007c6:       f3 0f c2 c8 02          cmpless %xmm0,%xmm1
  4007cb:       0f 54 c1                andps  %xmm1,%xmm0
  4007ce:       0f 55 ca                andnps %xmm2,%xmm1
  4007d1:       0f 56 c1                orps   %xmm1,%xmm0
  4007d4:       f3 0f 5a c0             cvtss2sd %xmm0,%xmm0
  4007d8:       e8 83 ff ff ff          callq  400760 <std::ostream& std::ostream::_M_insert<double>(double)@plt>
  4007dd:       48 89 c7                mov    %rax,%rdi
  4007e0:       e8 8b ff ff ff          callq  400770 <std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)@plt>
  4007e5:       31 c0                   xor    %eax,%eax
  4007e7:       48 83 c4 18             add    $0x18,%rsp
  4007eb:       c3                      retq   
  4007ec:       0f 1f 40 00             nopl   0x0(%rax)

Option 3:

0000000000400790 <main>:
  400790:       48 83 ec 18             sub    $0x18,%rsp
  400794:       bf 80 0d 60 00          mov    $0x600d80,%edi
  400799:       48 8d 74 24 08          lea    0x8(%rsp),%rsi
  40079e:       e8 dd ff ff ff          callq  400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
  4007a3:       48 8d 74 24 0c          lea    0xc(%rsp),%rsi
  4007a8:       bf 80 0d 60 00          mov    $0x600d80,%edi
  4007ad:       e8 ce ff ff ff          callq  400780 <std::istream& std::istream::_M_extract<float>(float&)@plt>
  4007b2:       f3 0f 10 44 24 0c       movss  0xc(%rsp),%xmm0
  4007b8:       bf c0 0e 60 00          mov    $0x600ec0,%edi
  4007bd:       f3 0f 5f 44 24 08       maxss  0x8(%rsp),%xmm0
  4007c3:       f3 0f 5a c0             cvtss2sd %xmm0,%xmm0
  4007c7:       e8 94 ff ff ff          callq  400760 <std::ostream& std::ostream::_M_insert<double>(double)@plt>
  4007cc:       48 89 c7                mov    %rax,%rdi
  4007cf:       e8 9c ff ff ff          callq  400770 <std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)@plt>
  4007d4:       31 c0                   xor    %eax,%eax
  4007d6:       48 83 c4 18             add    $0x18,%rsp
  4007da:       c3                      retq   
  4007db:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

So, what we can see here is that option 1 and 3 are equal while 2 is inferior. One possible explantation is that most if not all STL algorithms require operator<() for comparison and the GCC folks optimized for this (std::max() is implemented that way, too). Or maybe there is something about IEEE 754 and SSE2 that distinguishes < from >= for some reason.

HTH
Flössie

David_Tschumperle · September 13, 2016, 8:35am

I’ve just replaced all my previous use of cimg::min/max() by std:min/max() when it is possible in the code of G’MIC/CImg. I don’t expect a huge peed gain though, but it’s still nice to know.