Question about darktable performance/speed

Hello,

I use DT from several months under Manjaro linux ( git version or official 2.0.6 version) with an i7-4720hq/16Gb ram laptop and I’ve the impression that the latest version are slower especially with the denoise profile module (Olympus E-M5).

It’s easy to see when you navigate through a zoomed photo with denoise profile module active : there is a ‘longer than before’ calculation time preceding the correct display of the photo.

And it’s the same thing when you export a jpg. DT has become slower than Rawtherapee ( it was the inverse before).

Am I the only with this feeling ?

1 Like

I don’t have the feeling that it is getting slower. But there is some hard development work going on in the back. So maybe there are issues or performance drops in the git version?

You can start darktable from the console like:

darktable -d perf

to get some nice figures about performance. Denoise profile has always been slow with ‘non local means’ on my side. OpenCL boosts the speed more than 5 times in my case (CPU i7-2600, GPU AMD R9 270X).

edit: Do you use OpenCL? Because there seems to be a problem with the latest nvidia drivers.

Thanks for the tip “-d perf”.
And you’re right , denoise profile with ‘non local means’ is harshly slower than wavelet.
Same settings , except ‘non local means’ vs ‘wavelet’, same operation : zoom from full image to 100% display
non local means : 2,2 sec
wavelet : 0,210 sec …

Unable to use OpenCl with my hardware : Intel HD graphics from cpu only . Tested with darktable -d opencl

[opencl_init] opencl library ‘libOpenCL’ found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 1 device
[opencl_init] discarding CPU device 0 `Intel(R) Core™ i7-4712MQ CPU @ 2.30GHz’.
[opencl_init] no suitable devices found.
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.

Digging a little bit my performance problem :

  • official version ( 2.0.7 ) on my manjaro Linux is ok
  • my own compile version from git is slooowwww as soon as I use denoise profiles ( it’s almost unusable with ‘non local means’ ).

I don’t know if there is a problem with the git version or with compile process on my computer ( version problem with some required elements ?)

Intel OpenCL is not supported. So maybe it is related to that. I don’t think that there is a problem with git master, because my version from yesterday runs fine. IRC some fixes had been made to increase the stability. Maybe they cause this impact an performance?

I don’t think about OpenCl because on my computer, ‘official manjaro darktable 2.0.7’ runs at normal speed ( at least really faster than compiled git version ).

Perhaps compile options problems or gcc version or something like that.

I will try to compile a 2.0.7 version from sources to compare

How do you compile? And what is the output of darktable --version of both the official and the self compiled binaries?

I’ve tried yesterday my own compllation of 2.0.7 and … performances are ok ( similar to offcial manjaro 2.0.7).
I will try to compile 2.2.0rc0 as soon as possible.
To be followed …

Hello again.
So, I’ve 2 slightly differents versions :slight_smile:
Official manjaro : darktable --version
this is darktable 2.2.0rc0+71~gd483d9b-dirty
copyright (c) 2009-2016 johannes hanika
darktable-dev@lists.darktable.org

compile options:
bit depth is 64 bit
normal build
OpenMP support enabled
OpenCL support enabled
Lua support enabled, API version 4.0.0-dev
Colord support enabled
gPhoto2 support enabled
GraphicsMagick support enabled
OpenEXR support enabled

Own compiled version ( from Release darktable 2.2.0 rc0 · darktable-org/darktable · GitHub) :
this is darktable 2.2.0~rc0
copyright (c) 2009-2016 johannes hanika
darktable-dev@lists.darktable.org
(next : same compile options)

Then same operations on the same picture …
darktable 2.2.0rc0+71~gd483d9b-dirty (official)
[dev_pixelpipe] took 0,001 secs (0,003 CPU) initing base buffer [full]
[dev_pixelpipe] took 0,000 secs (0,000 CPU) processed ‘point noir/blanc raw’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,001 secs (0,003 CPU) processed ‘balance des blancs’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,001 secs (0,003 CPU) processed ‘reconstruire hautes lumières’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,014 secs (0,080 CPU) processed ‘dématriçage’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,232 secs (1,647 CPU) processed ‘réduction du bruit (profil)’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,606 secs (4,253 CPU) processed ‘réduction du bruit (profil) 1’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,221 secs (1,617 CPU) processed ‘réduction du bruit (profil) 2’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,002 secs (0,013 CPU) processed ‘exposition’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,004 secs (0,017 CPU) processed ‘courbe de base’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,003 secs (0,013 CPU) processed ‘profil de couleur d’entrée’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,006 secs (0,030 CPU) processed 'profil de couleur de sortie ’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,001 secs (0,010 CPU) processed ‘gamma’ on CPU, blended on CPU [full]
[dev_process_image] pixel pipeline processing took 1,092 secs (7,690 CPU)

darktable 2.2.0~rc0 (my compiled version)
[dev] took 0,000 secs (0,000 CPU) to load the image.
[dev_pixelpipe] took 0,001 secs (0,000 CPU) initing base buffer [full]
[dev_pixelpipe] took 0,000 secs (0,003 CPU) processed ‘point noir/blanc raw’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,001 secs (0,003 CPU) processed ‘balance des blancs’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,001 secs (0,007 CPU) processed ‘reconstruire hautes lumières’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,010 secs (0,050 CPU) processed ‘dématriçage’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,210 secs (1,193 CPU) processed ‘réduction du bruit (profil)’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 2,498 secs (17,860 CPU) processed ‘réduction du bruit (profil) 1’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,190 secs (1,277 CPU) processed ‘réduction du bruit (profil) 2’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,002 secs (0,017 CPU) processed ‘exposition’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,003 secs (0,013 CPU) processed ‘courbe de base’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,002 secs (0,010 CPU) processed ‘profil de couleur d’entrée’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,007 secs (0,033 CPU) processed 'profil de couleur de sortie ’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,002 secs (0,010 CPU) processed ‘gamma’ on CPU, blended on CPU [full]
[dev_process_image] pixel pipeline processing took 2,926 secs (20,477 CPU)

Almost the same time except for ‘reduction du bruit (profil) 1’ => denoise profile with non local means

And finally, git compile version :
this is darktable 2.2.0rc0+87~g5c91e91
copyright (c) 2009-2016 johannes hanika
darktable-dev@lists.darktable.org

compile options:
bit depth is 64 bit
normal build
OpenMP support enabled
OpenCL support enabled
Lua support enabled, API version 4.0.0-dev
Colord support enabled
gPhoto2 support enabled
GraphicsMagick support enabled
OpenEXR support enabled

[dev_pixelpipe] took 0,001 secs (0,003 CPU) initing base buffer [full]
[dev_pixelpipe] took 0,001 secs (0,003 CPU) processed ‘point noir/blanc raw’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,001 secs (0,003 CPU) processed ‘balance des blancs’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,001 secs (0,003 CPU) processed ‘reconstruire hautes lumières’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,022 secs (0,093 CPU) processed ‘dématriçage’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,190 secs (1,300 CPU) processed ‘réduction du bruit (profil)’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 2,193 secs (17,013 CPU) processed ‘réduction du bruit (profil) 1’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,190 secs (1,303 CPU) processed ‘réduction du bruit (profil) 2’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,002 secs (0,013 CPU) processed ‘exposition’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,002 secs (0,017 CPU) processed ‘courbe de base’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,003 secs (0,013 CPU) processed ‘profil de couleur d’entrée’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,006 secs (0,033 CPU) processed 'profil de couleur de sortie ’ on CPU, blended on CPU [full]
[dev_pixelpipe] took 0,002 secs (0,013 CPU) processed ‘gamma’ on CPU, blended on CPU [full]
[dev_process_image] pixel pipeline processing took 2,613 secs (19,813 CPU)

Any idea ?

Can anyone point me to this code please :wink:

1 Like

It would be in the kernel’s cache already. :slight_smile:

1 Like

Did you use “non-local means” or “wavelets” as the denoise mode in profiled denoise?

Thanks for your interest.

3 instances of profiled denoise :
réduction du bruit (profil) : wavelets
réduction du bruit (profil) 1 : non-local means
réduction du bruit (profil) 2 : wavelets
It’s strictly the same process for the 3 darktable versions tested.

  • launch darktable -d perf
  • open the same photo with same history from Lighttable by double click
  • in darkroom double click the photo to have a 1:1 view.

According to “d -perf” results all operations use almost the same time except for profiled denoise.
My 2 “own compiled” versions are slightly faster for wavelets denoise and highly slower for non-local means denoise.

A flaw in my compile process ? A compile option ?

What is strange … In a ‘timeline’ it 's seem there is :
11/6/2016 : darktable 2.2.0~rc0 : compile by me an slow in non-local means denoise
11/7/2016 : darktable 2.2.0rc0+71~gd483d9b-dirty : compile by manjaro and ok
11/15/2016 : darktable 2.2.0rc0+87~g5c91e91 : compile by me and slow in non-local means denoise

Hence my question about compile option or something like that …

Well, did you tell us how you compile? I asked that early in the thread but can’t remember seeing an answer … :wink:

1 Like

So, according to my mood, :slight_smile: and darktable’s site , I use 2 ways :
cd $HOME/darktable
./build.sh -j 8 ( on my system correct number of cpu aren’t detected by build script)
cd build
sudo make install

or ‘manual’ way :

mkdir $HOME/darktable/build
cd $HOME/darktable/build
cmake -DC__MAKE__BUILD__TYPE=Release …
cd build
sudo make install

I’m not really comfortable with compile process 'cause I mainly use script language (PHP, JS )

Those look good to me. Please rebuild from latest git master sources and run darktable --version again.

Edit: Are those double _ in the cmake call? It should only be single ones: cmake -DCMAKE_BUILD_TYPE=Release ..

Ok.
I’ll do that this evening.
double _ instead of single are typing mistakes ( formating messages in this forum use _ for italic text :slight_smile:)

Finally, I’ve got it ! :slight_smile:

In fact the darktable 2.2.0rc0+71~gd483d9b-dirty which is ok is not the official manjaro version but an own compile version using manual way and installed in /usr/local/bin which is before /usr/bin ( manjaro version install) in path. So I was mistaken because using “darktable” in command line leads to the usr/local/bin version ( I use ./darktable in /opt/darktable/bin to test my compiled version)

The difference between ‘perf ok’ and ‘perf not ok’ is bring about by BUILD_TYPE at compile time.
Using script build.sh ,default build_type is RelWithDebInfo and install path is /opt/darktable
Using manual way, build_type is Release ( set with cmake command) and install path is /usr/local

Using build.sh --build-type Release give same perf result than manual procedure.

DebugInfo seems time consuming for non-local means denoise profile.

Sorry for the mess :flushed: