Performance issues

hard to diagnose from here. also i’m not super deep in dt’s code any more nowadays. it’s clearly wasting a cycle or two between runs of opencl kernels, but the numbers you pointed to seem pathological. did you also compare --disable-opencl runs to rule out that it may be a driver issue of some sort? other than that, is this a debug build? we used to have a mode that would compile in all sorts of expensive paranoid checks that would slow down the pipeline quite significantly. if i was to debug this further, i’d probably run it through perf and see where all the time goes.

I changed the opencl device priority myself in order to try to utilize the NVIDIA card more. You are right, I have integrated graphics.

So the bottom line is: no change to config file.

I use the master (cdaaee). @obe, what version and OS do you use? Did you build it yourself? If yes, how?
@hanatos: it was a CPU code path (tone equalizer has no OpenCL implementation). OpenCL paths are also much slower.

hm, i’ve seen the tone equaliser. but there were others that seemed to indicate GPU usage. this would still mean that there’d be some copying over etc and that the driver gets to do things that may be stupid. i’d still run the --disable-opencl version, just to be sure we’re not chasing ghosts here.

Turning opencl off resulted in this:

123,542625 [dev_process_export] pixel pipeline processing took 106,910 secs (395,641 CPU)

I use windows 10 64 bits version.

and Darktable 3.2.1

Yes, a few seconds to export an image are typical (if very complex edits are involved, especially without a discrete graphics card, even half a minute would not be surprising).

Hi’ @kofa and @hanatos

I uninstalled Darktable, removed all Darktable files, installed Darktable from www.darktable.org and ran Darktable totally default exporting the same image. Here is the result:

95,316423 [dev_process_export] pixel pipeline processing took 80,341 secs (209,875 CPU)

105,465572 [opencl_summary_statistics] device ‘GeForce GTX 850M’ (0): 596 out of 597 events were successful and 1 events lost

105,472570 [opencl_summary_statistics] device ‘Intel(R) HD Graphics 4400’ (1): NOT utilized

No change. Performance is as bad as always.

As shown in my earlier post opencl speeds up the export a lot.

What should I do now? I will be happy to upload logfiles or try whatever you guys suggest, because I think that other windows users may suffer from the same prolem(s).

You most likely have underlying problems not related to DT itself .Since you run windows, id suggest the following at this point.

  1. What is the exact laptop model? If possible consider to disable the Intel Integrated graphics in BIOS/UEFI. Ive always did this with laptops in the past, gets rid of annoying issues. Unless you have specific need for it (which I doubt). Windows does allot of weird stuff with Igpu.
    Edit: You can also disable it in Device Manager, but I would strongly advice the BIOS route… Much cleaner. (My money is on this being your issue)
    On some laptop unfortunately its not possible though. The laptop display have to use the IGPU.
    Edit2: Just to explain, windows tend to switch automatically between the cards, use the integrated for things like browsing, watching video and the “proper” card for graphics intensive stuff. To save power. This in my experience does not work very “smooth”, no matter what setting you set within windows/driver.

  2. Update the Nvidia driver:
    Unfortunately no Studio driver for your card, but:
    GeForce Game Ready Driver | 457.09 | Windows 10 64-bit | NVIDIA

  3. Check for underlying problems:

  • Go to Device manager, check if any ! and yellow warnings on any of your devices.
  • Go to Event-Viewer > Windows Logs > System. Click “filter current log” and choose errors only. Later expand to warnings if needed.
  • Also check Task manager for anything you have running that hogs CPU/memory/Disk that you didnt expect.
  1. Check BIOS for any Integrated VS Dedicated GPU settings. But as before, I’d disable the CPU integrated graphics.

  2. Go to start > run, type “winver” and report what version of windows 10. Consider wiping/reinstalling Windows 10. Or at the very least run windows 10 update tool to get up to at least Build 1909.

  3. General cleanup

  • Run a proper disk cleanup, (right click the drive, properties etc) Also choose Clean System files. Clean out everything it finds. You can also manually wipe the “C:\Windows\SoftwareDistribution”
  • Check what you have listed to auto start in Task manager, disable what you do not absolutely need. And uninstall applications you no longer need/use
1 Like

Another possible (less preffered) option may be to force the Nvdia driver to always use the nvidia card with DT. Id say only use this option if your laptop doesnt support disabling the Intel HD card.

“Manage 3D settings” tab, go to “program settings” and click “add” to add Darktable executable.

You can also try to force the display to use the Nvidia card under display settings. Its something like “Try to connect anyway on:…”

But if you give me the Laptop Model i can check more specifically what the options are.

1 Like

Running darktable without OpenCL did not help, see Performance issues - #46 by obe - so I’d be surprised if NVidia driver updates etc. would help. It’s worth a try, along with the other suggestions, of course.

[quote=“Tore_Valberg, post:51, topic:21039”]
nage 3D settings” tab, go to “program settings” and click “add” to add Darktable executable.

Yeah that’s a good point, likely some underlying problems are going on. But if he can, I’d definitely disable that Intel IGPU anyway.

Hi’ @Tore_Valberg and @kofa

Thanks for your efforts and help.

OpenCl did help:
With OpenCl: pipeline processing took 85,214 secs (223,359 CPU)
Without OpenCl: pipeline processing took 106,910 secs (395,641 CPU)

My laptob is: HP ENVY 15-k061no Notebook PC (ENERGY STAR) Serial number 5CD4397414

Windows 10 has been regularly updated all along. I just noticed today that I could update to windows 10 VERSION 20H2. Is that the version you are talking about? Vinwer tells me: build 19041.572.

My pc is rather clean, just basic (and some few programs installed of course).

Your first post with suggestions seems rather complicated and “frigthening” to me but I will see what I can do………

Everything else seems to run very well. Possible underlying problems should not show only in Darktable, should they?

I’m sorry, I’m unable to help. I’m neither a darktable developer, nor a Windows user.
If others using Windows don’t have such awful performance, then I’m inclined to agree that it’s something to do with your machine / its Windows installation.

I was of the same thinking, but then I run his image on my computer and…

0.652404 [dev] took 0.190 secs (0.202 CPU) to load the image.
0.667288 [export] creating pixelpipe took 0.011 secs (0.055 CPU)
0.693795 [dev_pixelpipe] took 0.026 secs (0.054 CPU) initing base buffer [export]
0.713126 [dev_pixelpipe] took 0.019 secs (0.073 CPU) processed `raw black/white point' on CPU, blended on CPU [export]
0.730959 [dev_pixelpipe] took 0.018 secs (0.070 CPU) processed `white balance' on CPU, blended on CPU [export]
0.791050 [dev_pixelpipe] took 0.060 secs (0.446 CPU) processed `highlight reconstruction' on CPU, blended on CPU [export]
1.332629 [dev_pixelpipe] took 0.541 secs (3.466 CPU) processed `demosaic' on CPU, blended on CPU [export]
3.193797 [dev_pixelpipe] took 1.861 secs (12.567 CPU) processed `lens correction' on CPU, blended on CPU [export]
3.252554 [dev_pixelpipe] took 0.059 secs (0.398 CPU) processed `exposure' on CPU, blended on CPU [export]
9.034613 [dev_pixelpipe] took 5.782 secs (40.261 CPU) processed `tone equalizer' on CPU, blended on CPU [export]
9.094097 [dev_pixelpipe] took 0.059 secs (0.458 CPU) processed `input color profile' on CPU, blended on CPU [export]
32.446871 [dev_pixelpipe] took 23.353 secs (175.435 CPU) processed `denoise (non-local means)' on CPU, blended on CPU [export]
38.913880 [dev_pixelpipe] took 6.466 secs (42.651 CPU) processed `contrast equalizer' on CPU, blended on CPU [export]
40.045200 [dev_pixelpipe] took 1.131 secs (7.446 CPU) processed `local contrast' on CPU, blended on CPU [export]
41.577268 [dev_pixelpipe] took 1.532 secs (11.920 CPU) processed `output color profile' on CPU, blended on CPU [export]
41.652031 [dev_pixelpipe] took 0.075 secs (0.594 CPU) processed `display encoding' on CPU, blended on CPU [export]
41.652139 [dev_process_export] pixel pipeline processing took 40.985 secs (295.866 CPU)

This is without OpenCL because I have the GPU disabled at the moment, but still I think it points to something else going on here:

guille2306 (Intel i7-4720HQ @ 3.600GHz with 16 GB RAM): 9.034613 [dev_pixelpipe] took 5.782 secs (40.261 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

This is on Linux and I would say more or less in line with the expected difference between the i7-4510U and the i7-4720HQ with multi-threading enabled. I still don’t understand why such a simple edit is so slow, but that’s two completely different systems stumping on the same image.

Maybe the outliers here are @kofa’s results? @kofa: I see your’re on a development version of darktable. Can you downgrade and test in 3.2.1?

Its not really that scary, and the HP laptop BIOS is rather “dumbed down”, so wont let you disable both graphic card etc. Looks like your laptop doesn’t have the option to disable the IGPU though, but Id still boot into bios and check

Performance problem can be rather elusive. People get used to how their computer behave, and things like Browsing, Email, video etc will probably work just fine as its not resource intensive. But when running a heavy game, heavy graphic task or benchmarks you will notice it.

Check your event viewer, device manager, task manager etc. Also look for any forms of power saving that can be pathological on laptops. You may be facing CPU clockdowns etc. Ive seen that before.

I also personally never go more than a couple of years without a complete wipe/format and windows reinstall. Windows gets clogged up over time.

Yes by golly your right. I should have seen your Denoise numbers.

Ran on my work laptop now, also win10. Its the Denoise module seem to be the culprit on my laptop at least

67.277793 [dev_process_export] pixel pipeline processing took 31.775 secs (221.422 CPU)

Disabled tone EQ
175.243276 [dev_process_export] pixel pipeline processing took 29.271 secs (201.422 CPU)

Disabled Contr EQ
257.730763 [dev_process_export] pixel pipeline processing took 24.653 secs (167.328 CPU)

Disabled Denoise
312.363786 [dev_process_export] pixel pipeline processing took 5.653 secs (30.609 CPU)

Re-enabled denoise
516.443039 [dev_pixelpipe] took 27.292 secs (199.094 CPU) processed `denoise (profiled)’ on CPU, blended on CPU [export]
520.230905 [dev_process_export] pixel pipeline processing took 31.841 secs (227.516 CPU)

1 Like

This is truly bizarre. I use Windows myself on a few years old Dell laptop - Nvidia 1060. Also on Macbook pro with Radeon card (just to be clear I’m running Windows too in this Macbook). Both have no problem at all processing images with DT. No problem whatsoever using tone equalizer or not.

Sorry I don’t quite understand the numbers posted here, but seems like you are getting few hours rather than few seconds of processing time? There must be something really odd in the Windows itself.

When running this test, does your CPU goes up for the entire process? What about memory usage? I’m suspecting something on your Windows holding DT from using the CPU. Can you check the CPU temperature too? Usually the laptop slows down when CPU is too hot.

Also, do you have problem with other application that also use heavy resources?

To try compare i took a 90mb 48mp shot and tested

No Denoise:
1005.994083 [dev_process_export] pixel pipeline processing took 4.869 secs (33.891 CPU)

Profiled denoise for this image
1163.342722 [dev_process_export] pixel pipeline processing took 65.728 secs (480.953 CPU)

Changed to Wavelets Auto
1339.393436 [dev_process_export] pixel pipeline processing took 23.275 secs (144.469 CPU)

Copy over the Denoise from

So Wavelets better but

Yikes!

Never noticed this before. Id be curuious to see comparison with older DT. Or is this just expected from Denoise Profiled?

Booting up my main PC with linux now

Sorry for the spams, running high on caffeine today.

I just ran it now on my main PC with Manjaro & DT 3.2.1
34,848216 [dev_process_export] pixel pipeline processing took 3,783 secs (18,349 CPU)

EDIT, now disabled opencl, and i get this on linux with a ryzen 3900X
16,779625 [dev_process_export] pixel pipeline processing took 12,195 secs (262,122 CPU)

I suspect thats comparable to OBEs CPU. So prob just simply that Denoise is much slower with CPU?

Fast in GPU, very slow on CPU :frowning:

But now looking at OBEs original log,
58,275698 [dev_pixelpipe] took 18,082 secs (2,172 CPU) processed `denoise (non-local means)’ on GPU, blended on GPU [export]

Also also tone Eq is slow for him.
39,631271 [dev_pixelpipe] took 21,260 secs (74,266 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

While i see:
6,981280 [dev_pixelpipe] took 0,698 secs (12,117 CPU) processed `tone equalizer’ on CPU, blended on CPU [export]

Are we just looking at simple hardware performance, and CPU vs GPU on denoise?

Only thing that’s off for me now is the massive difference in Denoise, CPU vs GPU. And the Tone Eq for OB. (back to check you PC for CPU throttling or bottlenecks?)

Ok, i need to get back to work now :slight_smile: