Hi there,
Some people recently reported that G’MIC runs way slower on Windows than on Linux (sometimes by a factor x10), with speed tests done on the same machine.
This is something I’ve also noticed from time to time, when testing the cli version on my Windows (Virtual Machine). I thought this had to do with the fact it was run from a VM, but apparently this is not the main reason.
So, it would be nice to have first, the explanation of this performance drop, and second maybe a way to fix it to make G’MIC faster for all the Windows users
I can say I’m definitely not an expert about the mysteries of Windows, so any help is welcome !
What I’ve done so far :
- I’ve compiled the G’MIC cli tool with code profiling options (using
g++
with options-pg -g
). - Launched one of the demo that seems to run slowly (
gmic -x_landscape
). It runs at approx. 19fps on Windows, compared to 50fps on Linux. - At exit, the profiling informations are stored into file
gmon.out
, and those informations can be displayed usinggprof gmic.exe
, which gives something like:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
36.00 1.17 1.17 _mcount_private
17.54 1.74 0.57 __tcf_39
7.08 1.97 0.23 __fentry__
5.54 2.15 0.18 2603 0.07 0.07 cimg_library::CImg<float>::get_resize(int, int, int, int, int, unsigned int, float, float, float, float) const
3.38 2.26 0.11 722 0.15 0.57 gmic::add_commands(_iobuf*, char const*, unsigned int*, unsigned int*)
3.08 2.36 0.10 16471201 0.00 0.00 cimg_library::CImg<float>::_cimg_math_parser::mp_i(cimg_library::CImg<float>::_cimg_math_parser&)
3.08 2.46 0.10 456 0.22 0.22 cimg_library::CImg<float>& cimg_library::CImg<float>::draw_image<float, float>(int, int, int, int, cimg_library::CImg<float> const&, cimg_library::CImg<float> const&, float, float)
2.77 2.55 0.09 12555760 0.00 0.00 cimg_library::CImg<float>::_cimg_math_parser::mp_if(cimg_library::CImg<float>::_cimg_math_parser&)
2.77 2.64 0.09 238 0.38 0.38 cimg_library::CImgDisplay& cimg_library::CImgDisplay::render<float>(cimg_library::CImg<float> const&)
2.46 2.72 0.08 12494981 0.00 0.00 cimg_library::CImg<float>::_cimg_math_parser::mp_jxyzc(cimg_library::CImg<float>::_cimg_math_parser&)
1.23 2.76 0.04 868844 0.00 0.00 cimg_library::CImg<float>& cimg_library::CImg<float>::draw_line<float>(int, int, int, int, float const*, float, unsigned int, bool)
0.92 2.79 0.03 12494981 0.00 0.00 cimg_library::CImg<float>::atXYZC(int, int, int, int, float const&) const
0.92 2.82 0.03 1428 0.02 0.02 cimg_library::CImg<float> cimg_library::CImg<float>::get_warp<float>(cimg_library::CImg<float> const&, unsigned int, unsigned int, unsigned int) const
0.92 2.85 0.03 458 0.07 2.24 gmic& gmic::_run<float>(cimg_library::CImgList<char> const&, unsigned int&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, unsigned int const*, bool*, char const*, cimg_library::CImg<unsigned int> const*)
---8<----
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 0.31% of 3.25 seconds
index % time self children called name
<spontaneous>
[1] 36.0 1.17 0.00 _mcount_private [1]
-----------------------------------------------
[2] 33.3 0.03 1.05 2+45270 <cycle 2 as a whole> [2]
0.03 0.99 458+1338 gmic& gmic::_run<float>(cimg_library::CImgList<char> const&, unsigned int&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, unsigned int const*, bool*, char const*, cimg_library::CImg<unsigned int> const*) <cycle 2> [4]
0.00 0.06 44814+11248 cimg_library::CImg<char> gmic::substitute_item<float>(char const*, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, unsigned int const*, cimg_library::CImg<unsigned int> const*, bool) <cycle 2> [23]
-----------------------------------------------
0.00 0.54 1/2 void gmic::_gmic<float>(char const*, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, char const*, bool, float*, bool*) [clone .constprop.1137] [6]
0.00 0.54 1/2 gmic& gmic::run<float>(char const*, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, float*, bool*) [8]
[3] 33.3 0.00 1.08 2 gmic& gmic::_run<float>(cimg_library::CImgList<char> const&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, float*, bool*) [3]
0.03 1.05 2/2 gmic& gmic::_run<float>(cimg_library::CImgList<char> const&, unsigned int&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, cimg_library::CImgList<float>&, cimg_library::CImgList<char>&, unsigned int const*, bool*, char const*, cimg_library::CImg<unsigned int> const*) <cycle 2> [4]
0.00 0.00 6/32543 cimg_library::CImg<unsigned int>::assign(unsigned int, unsigned int, unsigned int, unsigned int) [434]
0.00 0.00 4/190630 cimg_library::CImg<char>::assign(unsigned int, unsigned int, unsigned int, unsigned int) [415]
0.00 0.00 2/15013 cimg_library::CImg<unsigned int>::fill(unsigned int const&) [445]
0.00 0.00 2/2 gmic::abort_ptr(bool*) [573]
(...)
If I do the same experiment on Linux, I don’t get the three first lines, that apparently takes 50% of the global execution time !
36.00 1.17 1.17 _mcount_private
17.54 1.74 0.57 __tcf_39
7.08 1.97 0.23 __fentry__
but the rest is similar.
I didn’t find much information about these three lines, but I guess this is not good!
Anyone has clues about that ? It would explain the performance issues on Windows, but what is it exactly, and is there a way to fix it ?
EDIT: _mcount
is apparently a function called for doing the code profiling, so it is OK to have it in the logs.