Performance issues on Windows

David_Tschumperle · January 2, 2019, 10:32am

Hi there,

Some people recently reported that G’MIC runs way slower on Windows than on Linux (sometimes by a factor x10), with speed tests done on the same machine.
This is something I’ve also noticed from time to time, when testing the cli version on my Windows (Virtual Machine). I thought this had to do with the fact it was run from a VM, but apparently this is not the main reason.

So, it would be nice to have first, the explanation of this performance drop, and second maybe a way to fix it to make G’MIC faster for all the Windows users

I can say I’m definitely not an expert about the mysteries of Windows, so any help is welcome !

What I’ve done so far :

I’ve compiled the G’MIC cli tool with code profiling options (using g++ with options -pg -g).
Launched one of the demo that seems to run slowly (gmic -x_landscape). It runs at approx. 19fps on Windows, compared to 50fps on Linux.
At exit, the profiling informations are stored into file gmon.out, and those informations can be displayed using gprof gmic.exe, which gives something like:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 36.00      1.17     1.17                             _mcount_private
 17.54      1.74     0.57                             __tcf_39
  7.08      1.97     0.23                             __fentry__
  5.54      2.15     0.18     2603     0.07     0.07  cimg_library::CImg<float>::get_resize(int, int, int, int, int, unsigned int, float, float, float, float) const
  3.38      2.26     0.11      722     0.15     0.57  gmic::add_commands(_iobuf*, char const*, unsigned int*, unsigned int*)
  3.08      2.36     0.10 16471201     0.00     0.00  cimg_library::CImg<float>::_cimg_math_parser::mp_i(cimg_library::CImg<float>::_cimg_math_parser&)
  3.08      2.46     0.10      456     0.22     0.22  cimg_library::CImg<float>& cimg_library::CImg<float>::draw_image<float, float>(int, int, int, int, cimg_library::CImg<float> const&, cimg_library::CImg<float> const&, float, float)
  2.77      2.55     0.09 12555760     0.00     0.00  cimg_library::CImg<float>::_cimg_math_parser::mp_if(cimg_library::CImg<float>::_cimg_math_parser&)
  2.77      2.64     0.09      238     0.38     0.38  cimg_library::CImgDisplay& cimg_library::CImgDisplay::render<float>(cimg_library::CImg<float> const&)
  2.46      2.72     0.08 12494981     0.00     0.00  cimg_library::CImg<float>::_cimg_math_parser::mp_jxyzc(cimg_library::CImg<float>::_cimg_math_parser&)
  1.23      2.76     0.04   868844     0.00     0.00  cimg_library::CImg<float>& cimg_library::CImg<float>::draw_line<float>(int, int, int, int, float const*, float, unsigned int, bool)
  0.92      2.79     0.03 12494981     0.00     0.00  cimg_library::CImg<float>::atXYZC(int, int, int, int, float const&) const
  0.92      2.82     0.03     1428     0.02     0.02  cimg_library::CImg<float> cimg_library::CImg<float>::get_warp<float>(cimg_library::CImg<float> const&, unsigned int, unsigned int, unsigned int) const
  0.92      2.85     0.03      458     0.07     2.24  gmic& gmic::_run<float>(cimg_library::CImgList<char> const&, unsigned int&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, unsigned int const*, bool*, char const*, cimg_library::CImg<unsigned int> const*)

---8<----

		     Call graph (explanation follows)


granularity: each sample hit covers 4 byte(s) for 0.31% of 3.25 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     36.0    1.17    0.00                 _mcount_private [1]
-----------------------------------------------
[2]     33.3    0.03    1.05       2+45270   <cycle 2 as a whole> [2]
                0.03    0.99     458+1338        gmic& gmic::_run<float>(cimg_library::CImgList<char> const&, unsigned int&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, unsigned int const*, bool*, char const*, cimg_library::CImg<unsigned int> const*) <cycle 2> [4]
                0.00    0.06   44814+11248       cimg_library::CImg<char> gmic::substitute_item<float>(char const*, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, unsigned int const*, cimg_library::CImg<unsigned int> const*, bool) <cycle 2> [23]
-----------------------------------------------
                0.00    0.54       1/2           void gmic::_gmic<float>(char const*, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, char const*, bool, float*, bool*) [clone .constprop.1137] [6]
                0.00    0.54       1/2           gmic& gmic::run<float>(char const*, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, float*, bool*) [8]
[3]     33.3    0.00    1.08       2         gmic& gmic::_run<float>(cimg_library::CImgList<char> const&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, float*, bool*) [3]
                0.03    1.05       2/2           gmic& gmic::_run<float>(cimg_library::CImgList<char> const&, unsigned int&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, unsigned int const*, bool*, char const*, cimg_library::CImg<unsigned int> const*) <cycle 2> [4]
                0.00    0.00       6/32543       cimg_library::CImg<unsigned int>::assign(unsigned int, unsigned int, unsigned int, unsigned int) [434]
                0.00    0.00       4/190630      cimg_library::CImg<char>::assign(unsigned int, unsigned int, unsigned int, unsigned int) [415]
                0.00    0.00       2/15013       cimg_library::CImg<unsigned int>::fill(unsigned int const&) [445]
                0.00    0.00       2/2           gmic::abort_ptr(bool*) [573]

(...)

If I do the same experiment on Linux, I don’t get the three first lines, that apparently takes 50% of the global execution time !

36.00      1.17     1.17                             _mcount_private
 17.54      1.74     0.57                             __tcf_39
  7.08      1.97     0.23                             __fentry__

but the rest is similar.

I didn’t find much information about these three lines, but I guess this is not good!
Anyone has clues about that ? It would explain the performance issues on Windows, but what is it exactly, and is there a way to fix it ?

EDIT: _mcount is apparently a function called for doing the code profiling, so it is OK to have it in the logs.

David_Tschumperle · January 2, 2019, 11:45am

Some more news:

Apparently, the __tcf_xx entries are also generated by the code profiler. A bit strange it takes so much time anyway.
I’ve tried to disable OpenMP on Windows, and it makes a big difference. Disabling it actually double the speed for the x_landscape demo. It now runs almost at 50fps without OpenMP activated.
It has been noticed that the recent Stylize filter is really slower on Windows compared to Linux, but this filter actually uses a lot of OpenMP directives, so it may explain the difference. That’s a bit disappointing because the use of OpenMP is intended to render things actually faster, with the use of multiple CPU cores. Looks like it sucks a little bit on Windows

garagecoder · January 2, 2019, 12:05pm

Maybe there’s more to it than just OpenMP being particularly bad on windows, that doesn’t seem right somehow. I wonder if @Carmelo_DrRaw has experience? @heckflosse does optimisations in code, not sure if that extends to compiling though

David_Tschumperle · January 2, 2019, 12:13pm

I’ve just compiled the G’MIC plug-in for GIMP, on Windows, without OpenMP support.
The Stylize filter is indeed x5 or x6 faster now
I’ve also used another code profiler (Very Sleepy CS) with the cli version of G’MIC which also showed that a lot of time was spend in Threading creating and synchronization functions of the Windows API (added by OpenMP directives).

Maybe I should try to tune the way OpenMP behaves in the G’MIC/CImg code, I know there are some ways to control that, but I never took the time to look in detail.

heckflosse · January 2, 2019, 12:13pm

@David_Tschumperle Can you give a pointer to the code?

David_Tschumperle · January 2, 2019, 12:16pm

The Stylize filter is mainly based on a patch-matching algorithm defined in the CImg Library (method CImg<T>::matchpatch(), l.38036 of file CImg.h)

garagecoder · January 2, 2019, 12:20pm

Current version is here I guess?

Edit: oops, didn’t see you made it a link anyway, sorry!

David_Tschumperle · January 2, 2019, 12:21pm

Yes that is the same, the two repos (Framagit and github) are synchronized.

David_Tschumperle · January 2, 2019, 12:44pm

A first guess:

It seems that functions to do thread creation and synchronization on Windows are a bit slower on Windows than on Linux.
So, my plan is to enable multi-core processing (with OpenMP) only when the images to process are large enough. I was of course already doing it on Linux, but I’ll raise the threshold a bit more for Windows.
We’ll see what happens, then

Reptorian · January 3, 2019, 3:44am

Just here to let you know that the PDN G’MIC maintainer has disabled OpenMP on some filters, and it is much faster with some filters.

David_Tschumperle · January 3, 2019, 8:29am

Some update : With the help of @heckflosse, I’ve been able to make the random number generator used in G’MIC re-entrant which means removing some expensive mutex locks/unlocks.
This really improved the overall speed of some filters and commands, both on Linux and Windows.

For instance, the Stylize filter now runs almost 2.5x faster (on Linux) compared to the previous version. I guess the speed boost will be even more impressive on Windows (will be able to test only tomorrow).
More generally, all algorithms based on random numbers should get a speed boost !

Yalba · January 3, 2019, 11:12am

Very good news, thanks for improving this, and thanks to @heckflosse for the help too.

I wish there as a native .sln to try compiling it with vs2017 natively on windows and see if there is improvements.

Tried once the cmake file and it complained about some additional library packages missing, like Zlib and Png while i have those via VCPKG repo and they should be detected as include stuff, not sure where it was searching for… the other missing ones maybe can be found in the microsoft VCPKG repo too but i stopped at this point.

I’m not that great at this stuff sadly, i know how to compile simple things and stuff like speed related flags that usually improve a bit the performances but i’m not a coder tho, started to mess with cmake not long ago but can not really modify the cmake file too much with my limited knowledge…

Well, maybe one day someone with good knowledge of this will make something available so it will be possible to compile it on windows directly, maybe there is some gains to get this way, I have to say that you made me curious about this after this topic,

anyways, thanks for the efforts trying to improve the performances, always a great thing, especially with some long processing filters

PDN_GMIC · January 5, 2019, 5:11am

Did you use the vcpkg cmake toolchain file when compiling?

After the ‘vcpkg.exe integrate install’ command completes it specifies that you must set the ‘-DCMAKE_TOOLCHAIN_FILE=<VCPkg_Root>/scripts/buildsystems/vcpkg.cmake’ define in cmake in order for cmake to find the compiled packages.

I managed to get both the Paint.NET G’MIC-Qt plugin and the G’MIC cli example compiling with VS2017.

Interestingly it appears the Paint.NET G’MIC plugin and its dependencies are ~20Mb smaller when compiled with VS2017, although I do not know what compiler options the MinGW versions were built with.
Also when running the G’MIC v2.4.3 landscape demo there does not appear to be any FPS difference with OpenMP enabled or disabled.

As you have probably guessed from my username I am the Paint.NET G’MIC plugin maintainer.

Yalba · January 5, 2019, 8:32am

Hello,

That’s it, i was missing this part in the command, noob error… but like i said i’m not much experienced with this, slowly learning.

Now it detects Zlib and png that were already installed, downloaded a few others needed and now only reporting PkgConfig and X11 missing on the list , trying to figure out this last one, after a quick google search it seems it’s the FTLK one i need from vcpkg repo but even installed it complains about it missing.

I suspect the names of the libs coming from vcpkg are conflicting somewhere, might have to install the full package not using vcpkg to see if it goes better, or i am looking at the wrong thing maybe, not sure yet.

Still early in the morning and a few things to test and research about all this.

About openMp, i already noticed there was not much gains on a couple of things i compiled in vs2017, read once that openMp was sometimes problematic on it , i’ll have to try with the Intel compiler I integrated in VS2017, read a few times it can handle this better than native vs2017, curious about testing this…

Don’t want to go off-topic too much so i will just say a big thank you, hopefully i will manage to get this compiled after a few struggles figuring the blocking points i encounter for now.

Will keep me busy during the weekend i guess

Thanks a lot for the help and hint

PDN_GMIC · January 5, 2019, 8:59am

The FFTW3 library is problematic, vcpkg installs it under a different name ‘FFTW3 CONFIG REQUIRED’ instead of ‘FFTW3 REQUIRED’, but even after that change cmake could not find it.
For G’MIC-Qt I had to use the ‘FFTW3_DIR=’ define to tell it the package location, unfortunately that does not work for the G’MIC cli application.

When building the G’MIC cli application I turned off all of the packages it could not find.
Note that you will have to change the ‘cimg_display=0’ preprocessor definition to ‘cimg_display=2’, this tells CImg to use the Windows display APIs.

I tested the Stylize filter after compiling the Paint.NET plugin with G’MIC version 2.4.3 in Visual Studio and the rendering time dropped from 32 seconds to 2 seconds for the same image.

Yalba · January 5, 2019, 9:35am

I had FFTW3 installed and had this indeed :

The package fftw3:x64-windows provides CMake targets:

find_package(FFTW3 CONFIG REQUIRED)
target_link_libraries(main PRIVATE FFTW3::fftw3)
find_package(FFTW3f CONFIG REQUIRED)
target_link_libraries(main PRIVATE FFTW3::fftw3f)
find_package(FFTW3l CONFIG REQUIRED)
target_link_libraries(main PRIVATE FFTW3::fftw3l)

Will try with ‘cimg_display=2’ and turning the things it can not find OFF, only PkgConfig & x11 missing with “find_package” the rest is good and detected, also got all others using “pkg_check_modules” installed

pkg_check_modules(FFTW3 fftw3>=3.0)
pkg_check_modules(OPENCV opencv)
pkg_check_modules(GRAPHICSMAGICK GraphicsMagick++)
pkg_check_modules(OPENEXR OpenEXR)

But pretty sure they will not be detected correctly, will need to modify the cmake file (maybe i don’t know yet) or set the location in the command line most probably, still have to check a bit google about how to do this correctly ^^

Still have not tried anything with the Qt version for Gimp, but that’s the final goal if i manage to get this built, from 32 seconds to 2 seconds is just insane in terms of gain for performances o_O.

Like i said it will keep me a bit busy this weekend, but all good.

Thanks again for taking the time to help a nooby with this, really appreciate.

PDN_GMIC · January 5, 2019, 9:58am

The G’MIC build options start at line 79 in CMakeLists.txt, I have them set as follows.

Build Options

option(BUILD_LIB “Build the GMIC shared library” OFF)
option(BUILD_LIB_STATIC “Build the GMIC static library” OFF)
option(BUILD_CLI “Build the CLI interface” ON)
option(BUILD_PLUGIN “Build the GIMP plug-in” OFF)
option(BUILD_MAN “Build the manpage” OFF)
option(BUILD_BASH_COMPLETION “Build Bash completion” OFF)
option(CUSTOM_CFLAGS “Override default compiler optimization flags” OFF)
option(ENABLE_X “Add support for X11” OFF)
option(ENABLE_FFMPEG “Add support for FFMpeg” OFF)
option(ENABLE_FFTW “Add support for FFTW” ON)
option(ENABLE_GRAPHICSMAGICK “Add support for GrahicsMagick” OFF)
option(ENABLE_JPEG “Add support for handling images in Jpeg format” OFF)
option(ENABLE_OPENCV “Add support for OpenCV” OFF)
option(ENABLE_OPENEXR “Add support for handling images in EXR format” OFF)
option(ENABLE_OPENMP “Add support for parallel processing” OFF)
option(ENABLE_PNG “Add support for handling images in PNG format” ON)
option(ENABLE_TIFF “Add support for handling images in Tiff format” OFF)
option(ENABLE_ZLIB “Add support for data compression via Zlib” ON)
option(ENABLE_DYNAMIC_LINKING “Dynamically link the binaries to the GMIC shared library” OFF)

G’MIC-Qt has a few compatibility issues with the Visual Studio compiler.
__PRETTY_FUNCTION__ is not defined and the leading underscore in cimg_library::cimg::_rand() causes the compiler to not find the method.

Good luck getting the GIMP plugin to build.

David_Tschumperle · January 5, 2019, 11:03am

@PDN_GMIC, would it be possible you test with the latest development version from the git repository ?

https://gmic.eu/download.shtml#source

I’ve done some major modifications these two last days, and the Stylize filter should run even faster now (with OpenMP enabled this time).

Yalba · January 5, 2019, 12:49pm

I managed to get FFTW3 partially detected somehow, replacing “pkg_check_modules(FFTW3 fftw3>=3.0)” by “find_package(FFTW3 REQUIRED)”

But then for FFTW3 there is a problem similar to this, moving the files like hinted push the thing further but it ends up with same error as on the link at the end, might be fixed one of those days… Or I’ll try the fixes proposed that apparently are not merged yet.

OpenCV however using this method seems to be detected, but i have no more time right now to see if it will work at the compilation step or not, something i’ll try in the next days.

Still a lot to dig for all this but it offers the occasion to learn a bit more.

PS : sorry for hijacking this thread, not sure if i should start a fresh one about compiling gmic with vs2017, might help some others too, idk.

Thank you

PDN_GMIC · January 5, 2019, 10:35pm

G’MIC 2.4.3 and 2.4.4 both render a 394x266 pixel image in around 3 seconds with OpenMP enabled.