opencl works fine however, still heavy load e.g. for denoiseprofiled on CPU


#1

Hey guys,

I searched here and on google, but somehow my questions remains :slight_smile:

My darktable-cltest shows, actually opencl works fine, including “denoiseprofile_*”

However, just now I was compiling thunderbird in parallel, so it hurts even more, that dt anyhow pushes denoise profile tasks to my CPU:

_9367,591148 [pixelpipe_process] [full] using device -1_
_9367,591058 [pixelpipe_process] [preview] using device 0_
_9367,592081 [dev_pixelpipe] took 0,001 secs (0,001 CPU) initing base buffer [preview]_
_9367,593932 [dev_pixelpipe] took 0,002 secs (0,002 CPU) processed `Raw-Schwarz-/Weißpunkt' on GPU, blended on GPU [preview]_
_9367,598326 [dev_pixelpipe] took 0,004 secs (0,004 CPU) processed `Weißabgleich' on GPU, blended on GPU [preview]_
_9367,607697 [dev_pixelpipe] took 0,003 secs (0,005 CPU) processed `Spitzlicht-Rekonstruktion' on GPU, blended on GPU [preview]_
_9367,765836 [dev_pixelpipe] took 0,158 secs (0,580 CPU) processed `Entrastern' on CPU, blended on CPU [preview]_
_9367,779915 [dev_pixelpipe] took 0,014 secs (0,023 CPU) processed `Entrauschen (Profil)' on GPU, blended on GPU [preview]_
_9367,791673 [dev_pixelpipe] took 0,012 secs (0,014 CPU) processed `Entrauschen (Profil)' on GPU, blended on GPU [preview]_
_9367,792957 [dev_pixelpipe] took 0,001 secs (0,001 CPU) processed `Zuschneiden und drehen' on GPU, blended on GPU [preview]_
_9367,794534 [dev_pixelpipe] took 0,002 secs (0,002 CPU) processed `Basiskurve' on GPU, blended on GPU [preview]_
_9367,820533 [dev_pixelpipe] took 0,026 secs (0,072 CPU) processed `Eingabefarbprofil' on GPU, blended on GPU [preview]_
_9367,843278 [dev_pixelpipe] took 0,023 secs (0,088 CPU) processed `Schatten und Spitzlichter' on GPU, blended on GPU [preview]_
_9367,880803 [dev_pixelpipe] took 0,037 secs (0,113 CPU) processed `Equalizer' on GPU, blended on GPU [preview]_
_9367,925060 [dev_pixelpipe] took 0,044 secs (0,114 CPU) processed `Werte' on GPU, collected histogram on GPU, blended on GPU [preview]_
_9367,933788 [dev_pixelpipe] took 0,009 secs (0,011 CPU) processed `schärfen' on GPU, blended on GPU [preview]_
_9367,936180 [dev_pixelpipe] took 0,002 secs (0,004 CPU) processed `Ausgabefarbprofil' on GPU, blended on GPU [preview]_
_9367,945832 [dev_pixelpipe] took 0,010 secs (0,019 CPU) processed `Gamma' on CPU, blended on CPU [preview]_
_9368,013507 [opencl_profiling] profiling device 0 ('GeForce GTX 1060 6GB'):_
_9368,026736 [opencl_profiling] spent  0,0026 seconds in [Write Image (from host to device)]_

My CPU is a not that strong i7-5820K (6core-HT)

my opencl-settings are:

_#opencl_omit_whitebalance=false_
_opencl=TRUE_
_opencl_async_pixelpipe=true_
_opencl_avoid_atomics=false_
_opencl_checksum=2160716749_
_opencl_device_priority=*/*/*/*_
_opencl_disable_drivers_blacklist=false_
_opencl_enable_markesteijn=true_
_opencl_library=_
_opencl_mandatory_timeout=200_
_opencl_memory_headroom=300_
_opencl_memory_requirement=768_
_opencl_micro_nap=100_
_opencl_number_event_handles=300_
_opencl_scheduling_profile=default_
_opencl_size_roundup=16_
_opencl_synch_cache=false_
_opencl_use_cpu_devices=false_
_opencl_use_pinned_memory=false_

Do I have a misunderstanding?
Will those modules always run on CPU, no matter how slot it is and how powerful the GPU might be?!??

Thanks in advance!

Cheers
Axel


Darktable/OpenCL optimizations for medium/high DPI displays for interactive image editing
#2

Moinchen, Alex!

I am on thin ice here, but some thoughts…

a) Which operating system are you on, Win, 'ux, …?
b) Your GFX is one notch better than mine (I have just a 1050).
c) You use Nvidia or free drivers?

Here are my (darktable 2.7.0) opencl-settings, which differ somewhat from yours:

opencl=true
opencl_async_pixelpipe=false
opencl_avoid_atomics=false
opencl_checksum=4057252122
opencl_device_priority=*/!0,*/*/*
opencl_disable_drivers_blacklist=false
opencl_library=
opencl_mandatory_timeout=200
opencl_memory_headroom=300
opencl_memory_requirement=768
opencl_micro_nap=1000
opencl_number_event_handles=25
opencl_scheduling_profile=default
opencl_size_roundup=16
opencl_synch_cache=false
opencl_use_cpu_devices=false
opencl_use_pinned_memory=false

MfG
Claes in Lund, Schweden


#3

Dear Claes,

I am on Gentoo Linux with nvidia drivers, as this is the only way to utilize opencl at Linux.

Actually my device priority got messed up while copying. it is:
opencl_device_priority=*/*/*/*
(so I allow the preview to run on GPU, which also seems not really to happen)

other deviations between yours and mine coming from https://www.darktable.org/usermanual/en/darktable_and_opencl_optimization.html

Oh and my dt is 2.6.0 (I do run github just right after the major switch I want to utilize the official branch for a while)

Cheers Axel :wink:


#4

I don’t speak German so your logs took me a while to read but I think that Denoise (profiled) is done on GPU. You probably mean Demosaic (Entrastern)?


(Christian Kanzian) #5

@mimoklepes is right: Only demosaicing runs on CPU. All other modules used the GPU. The timings look good from my point of view. Turn off OpenCl and you will see how slow it will get …


#6

This depends on the selected demosaic method, I believe AMAZE does not have an opencl implementation (yet). Not sure if there’s a technical reason behind that or whether it might be added in future.


#7

Guys,

first of all, I am happy for the feedback. So often I see, I get swift reply here, that is really appreciated.

Sorry for German logs I should have spend some time to have it switched (honestly out of the top of my head, I donno how ;-> )

You are right, I am running demosaic (Entrastern) often in Amaze (improves noise) and at the same time I have a very intensive style for denoise (one is called denoise-extreme and launches 6 instances), where I mask bright parts and fully blacks in several instances I can have (actually could have, in 2.6 it got less good) fine details preserved. Just want that to run on my GPU :slight_smile:

Above sample of my logs is not that good one to show bad timings, there was a moment, I waited several seconds. Just we can see, actually often call the CPU and says blended on GPU (what ever that means, and where suddenly comes the colours from?)

What actually happens (light table and darkroom) while scrolling thru pics:

  • nvidia-settings shows 0% GPU load (and so the temp is also low and freq as well)
  • at the same time all 12 “cores” (6cores HT) are fully loaded and in the next moment it works smooth as a charm (means, GPU fully loaded and CPU “sleeping”.

Let me observe, whether or not that is always, when amaze is on (in that case the second GPU, which I am planning, will not help).

Cheers
Axel


#8

Hey everybody,

at the end of the day, after reading the fine manual (I should have done that earlier), I figured, dt supports even two totally different GPU if wanted. That motivated me and long story short, as I am so satisfied I bought a second Gigabyte RTX1060 G1 6GB.

And guess what, my above issue disappeared. To me it looks like dt makes the following decission: “GPU has a task, no matter its load, we give the next to the CPU”
…That is just my imagination based on my observations and might be wrong. However, if there is some truth in it, I consider this not yet the best strategy, but I am not a coder, so my words should not bother anyone who knows the backgrounds…

Anyway, when the budget is there, it is crazy to see, how much a 2nd GPU can speedup dt, even both are far away from heavily loaded. I thought, I can be smart with priorities in darktablerc, but I found out it is the best to just set it to opencl_device_priority=/// and dt does the trick (maybe even the standard would do)

Cheers
Axel


#9

Hi AlexG,

I may be slow (because it is still The First Thursday in March here),
but can you have two GFX to one monitor? Or what is your setup?

Have fun!
Claes in Lund, Sweden


(Ingo Weyrich) #10

You don’t need a monitor to do processing on GPU


#11

Hmmmm. Interesting!


#12

Dear Claes,

please note Alex and Axel are two totally diff. names here :wink:

nono I just have had one GTX1060 G1 and the monitor(s) connected to it. Then the second GTX1060 is in now and all sockets are empty. I just address the GPU thru Opencl

I hope my screenshot helps…


#13

Dear Cleas,

I hope I also did not step on toes. I just said about the name, as it wasn’t the first time you mixed mine. I hope my smiley clearly showed im perfectly fine and I hope, so do you :wave::smiley:


#14

Now that you have spelled his name wrong once (yes, only once), you are sort of even. :stuck_out_tongue:


#15

OUCH my apologies (and yet again here where some typos). I blame my current flu for that :blush:

I buy all of us a bear or more preferably a wine
:wink: :joy::joy::joy:


(Martin Scharnke) #16

I think that :bear: might be cuddly, but maybe :beers: are better.


#17

:joy::joy::joy::joy::joy::joy::joy::joy:

It is good, I am on a sick leave. If I would write customer replies like this…

Now I have an official prove, I don’t like beer but wine :wink: