RT5 and speed query

(Andrew) #1

I hesitate to raise this…
In summary, RT5 seems rather slower than at least one previous version.
The background to this is that I’ve built a new PC, installed Ubuntu 16.10, then RT as per the RT download page. This doesn’t give version nos so I did the normal thing of getting the development version. This turned out to be old, 4.2 something, so uninstalled it, deleted .cache and .config stuff, and installed the stable version which is RT5.
As it’s a new PC I had been making some timings, so had a 20Mpxl raw file plus PP3 with wavelets going. I timed how long to make a jpeg. My old PC (win7, an old Localab build) took 14s, whereas the new one with V4.2 took about 5.5s, so I was quite happy. But repeating this now RT5 is installed, something has gone bad, it is taking 19s. I don’t know much about Ubuntu but I’ve checked the timings and tried to eliminate certain things. E.g. I saved to jpeg more than once so that any disk caching could take place - I wanted CPU results, not disk (though the new PC is on SSD anyway). The system monitor app showed all the cores being used in both 4.2 and 5. I have loads of memory (32Gb). RT5 is spending a good chunk of the 19s in highlight reconstruction at 55% complete. Not unusual I for this step to take a while I guess.
I did a quick search for “RT5 slow” before writing this but nothing relevant came up.
Has anyone any thoughts please?
Generally the PC seems quite snappy though I did post some details in the AppImage thread yesterday about GIMP Levels adjustments seeming sluggish to me.
Here is the PP3 -
test PP3.pp3 (9.7 KB)

(Ingo Weyrich) #2

Do you use a debug version?

(Andrew) #3

hi again!, here’s what’s in About -

Branch: gtk3
Version: 5.0
Changeset: 7fe7c4f60f85b47bc1d24b2cfece43444028e3a6
Compiler: x86_64-linux-gnu-gcc 6.2.0
Processor: x86_64
System: Linux
Bit depth: 64 bits
Gtkmm: V3.20.1
Build type: Release
Build flags: -std=c++11 -std=gnu++11 -Werror=unused-label -fopenmp -Werror=unknown-pragmas
Link flags: -Wl,-Bsymbolic-functions -Wl,-z,relro
OpenMP support: ON
MMAP support: ON

Would it say Build type: Debug if it was that? Otherwise I don’t know how to tell.

(Ingo Weyrich) #4

Thanks for the About. That’s clearly a release build of rt5, not a debug build.
Hmm, but why it’s slower?

Can you try without highlight reconstruction please?

I tested your pp3 on my machine (Win7/64, 32 GB, 8 cores á 4Ghz) using a D800 file (36 MP) and one of the most time consuming parts was Only edges in sharpening.

(Andrew) #5

This is really weird. Loaded up the photo and checked it was still taking 19s, as before. Then unchecked highlight rec. Saved again - this time 18s. The demosaicing took ages, not like previously. It’s as if it was trying to stay at 19s…! Repeated to be sure, but 18s again. Bizzare!

I think there’s also a bug to do with disk partitions. My photos are in a folder on a second partition, not the system. The RT file browser loses the location following a re-boot. It doesn’t lose it if it’s on the system partition.

(Ingo Weyrich) #6

What’s the cpu of your new machine?

(Andrew) #7

It’s an i7 6700K. With 32Gb of DDR4 3000 memory. No video card, using the built in graphics.

Was just experimenting, knocked out parts of the processing, got down to 11s, but on putting sharpening back in, it went back to 18s.

(Ingo Weyrich) #8

Can you try to turn off Edges only in Sharpening? That’s one of the rare parts I did not optimize yet :wink:

(Andrew) #9

Aha, that makes the difference between 19s and 13s.

(Ingo Weyrich) #10

But ‘Edges only’ is the same code as in rt 4.2

(Ingo Weyrich) #11

Hmm, looking at my About:

Build type: Release
Build flags:  -std=gnu++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG

-O3 is enabled. I wonder why it is not enabled in your build. But that’s a question for your package maintainer…

(Andrew) #12

I’m not following… what is -03?

(Andrew) #13

Just a thought, was 4.2 built using GTK2 (5.0 being GTK3…) ?

(Ingo Weyrich) #14

That’s an optimization level.

Yes, but that does not make a difference for processing speed. On gtk3 startup of rt maybe a bit slower but not processing of images

(Andrew) #15

So it sounds like the next step is to ask Dariusz if he can re-build with “03” set, would you say?

(Ingo Weyrich) #16

At least we should ask him about the build flags he uses to build rt…

(Andrew) #17

Ok, I’ll post something. A big Thankyou Ingo for the help tonight. Now I must go to bed!

(Alberto) #18

I’m not a RawTherapee developer, but I agree that you should investigate why there is no -O3 in your build flags. I just did a quick test with your pp3 on my machine, here are some results:

Build type: Release
Build flags:  -std=gnu++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas

Execution time with the above:

real 54.21
user 205.26
sys 0.80

Build type: Release
Build flags:  -std=gnu++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG

Execution time with the above:

real 11.82
user 42.21
sys 1.22

FWIW, I also use -ffast-math to get a bit more performance (especially on older hardware):

Build type: Release
Build flags: -Wno-deprecated -std=gnu++11 -march=native -Werror=unused-label -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG -ffast-math

execution time:

real 10.85
user 38.20
sys 1.27

not a big deal on my newest (work) laptop, but still nice. On my home machine the difference is more noticeable.

GIMP 2.9.5 AppImage
(Andrew) #19

Thanks @agriggio, this is great news. I don’t know exactly what “real” and “user” are, but it’s clearly about 5x faster on your system with 03 set! Last night when I was baffled at 03, I thought the quotes / graphic bits in posts were just images, I didn’t realise the scroll bar worked! - so I can now see the 03 amongst the flags…(doh)

@Dariusz_Duma, Request for Help Please !! Is there any chance you might be able to re-build the download for Ubuntu and RT5 with 03 included? please please. This would deliver a huge performance increase for many or all such users, surely? Perhaps there are other settings that might benefit from reviewing. Hoping you can help.

(Flössie) #20

Using -ffast-math is dangerous. Ingo also mentioned that once, if I recall correctly, but I can’t find the link I have in mind.