Question about gpu usage

Hi,

Given the extreme limitations of my hardware, I’m borrowing my wife’s Windows notebook in an attempt to get more reasonable times during export with diffuse or sharpen.

The thing is: after tweaking her notebook to force it using the gpu with darktable, I noticed that the gpu isn’t fully used, specially with DoS.

Here’s what happens:

C:\Users\gustavo>type dtfull.log
16,877835 [dev] took 0,114 secs (0,094 CPU) to load the image.
17,143132 [export] creating pixelpipe took 0,236 secs (0,734 CPU)
17,148176 [dev_pixelpipe] took 0,004 secs (0,000 CPU) initing base buffer [export]
17,175179 [dev_pixelpipe] took 0,027 secs (0,016 CPU) processed `raw black/white point' on GPU, blended on GPU [export]
17,189186 [dev_pixelpipe] took 0,014 secs (0,000 CPU) processed `white balance' on GPU, blended on GPU [export]
17,201476 [dev_pixelpipe] took 0,012 secs (0,000 CPU) processed `highlight reconstruction' on GPU, blended on GPU [export]
17,723681 [dev_pixelpipe] took 0,522 secs (0,047 CPU) processed `demosaic' on GPU, blended on GPU [export]
27,602989 [dev_pixelpipe] took 9,879 secs (1,141 CPU) processed `denoise (profiled)' on GPU, blended on GPU [export]
28,116891 [dev_pixelpipe] took 0,514 secs (0,438 CPU) processed `lens correction' on GPU, blended on GPU [export]
29,761048 [dev_pixelpipe] took 1,644 secs (10,516 CPU) processed `chromatic aberrations' on CPU, blended on CPU [export]
29,956514 [dev_pixelpipe] took 0,195 secs (0,047 CPU) processed `rotate and perspective' on GPU, blended on GPU [export]
30,003329 [dev_pixelpipe] took 0,047 secs (0,000 CPU) processed `exposure' on GPU, blended on GPU [export]
30,048260 [dev_pixelpipe] took 0,045 secs (0,000 CPU) processed `crop' on GPU, blended on GPU [export]
31,800975 [dev_pixelpipe] took 1,753 secs (0,484 CPU) processed `diffuse or sharpen sharpen' on GPU with tiling, blended on CPU [export]
31,962955 [dev_pixelpipe] took 0,162 secs (0,016 CPU) processed `input color profile' on GPU, blended on GPU [export]
32,088090 [dev_pixelpipe] took 0,125 secs (0,016 CPU) processed `color calibration' on GPU, blended on GPU [export]
1896,441951 [dev_pixelpipe] took 1864,354 secs (13830,500 CPU) processed `diffuse or sharpen local contrast' on CPU with tiling, blended on CPU [export]
3856,060757 [dev_pixelpipe] took 1959,619 secs (13683,047 CPU) processed `diffuse or sharpen dehaze' on CPU with tiling, blended on CPU [export]
3857,079530 [dev_pixelpipe] took 1,017 secs (0,016 CPU) processed `sharpen' on GPU, blended on GPU [export] 
3857,226343 [dev_pixelpipe] took 0,146 secs (0,016 CPU) processed `color balance rgb' on GPU, blended on GPU [export]
3857,301580 [dev_pixelpipe] took 0,075 secs (0,000 CPU) processed `filmic rgb' on GPU, blended on GPU [export]
3857,447083 [dev_pixelpipe] took 0,144 secs (0,016 CPU) processed `output color profile' on GPU, blended on GPU [export]
3857,590605 [dev_pixelpipe] took 0,143 secs (0,375 CPU) processed `display encoding' on CPU, blended on CPU [export]
3857,592660 [dev_process_export] pixel pipeline processing took 3840,449 secs (27526,688 CPU)

With times like this, my old i3 beats her notebook, with around 500 secs DoS :stuck_out_tongue_winking_eye:

Why the first DoS instance could use the GPU and get a reasonable time, but the last one couldn’t?

Attached, darktable-cltest:

cltest.txt (50.7 KB)

Perhaps the other two instances ran into a GPU error and reverted to CPU. There have been discussions recently about cases where people didn’t have the GPU memory headroom (the amount the GPU needs to do normal GUI stuff) set correctly in darktablerc.

See for example: OpenCL analysis... Darktable... much faster with Opencl disabled...something wrong??

Thanks.

@kofa lists a set of parameters which seem to have an important play in performance, if set accordingly.

Are these parameters session or compile parameters?

All of these parameters are set in the darktablerc configuration file, usually found in ~/.config/darktable

1 Like

ok, now the gpu comes to life:

458,599928 [default_process_tiling_cl_ptp] tile (11, 3) with 2093 x 2201 at origin [2112, 585]
467,891187 [dev_pixelpipe] took 415,735 secs (40,922 CPU) processed `diffuse or sharpen local contrast' on GPU with tiling, blended on CPU [export]
467,892942 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 4205 x 2786
467,892959 [default_process_tiling_cl_ptp] (22 x 15) tiles with max dimensions 2240 x 2243 and overlap 1024
467,892971 [default_process_tiling_cl_ptp] tile (0, 0) with 2240 x 2243 at origin [0, 0]
476,644803 [default_process_tiling_cl_ptp] tile (0, 1) with 2240 x 2243 at origin [0, 195]
4

But its times are on par with my old i3, 12 GB ram.

I suspect that it has to do with tiling. If I correctly understood it, the less ammount of tiles, the better, right? And it has to do with available memory, I believe…

For the record, I managed to improve this GPU performance up to an acceptable threshold.

To export an image with three DoS instances (AA sharpen, add local contrast and dehaze), it takes around 5 minutes, against 18 minutes on my i3, no gpu notebook.

These are the darktablerc settings tweaked by me for GeForce MX110 (2GB):

opencl=TRUE
opencl_async_pixelpipe=true
opencl_avoid_atomics=false
opencl_checksum=3202946953
opencl_device_priority=*/!0,*/*/*/!0,*
opencl_disable_drivers_blacklist=false
opencl_library=
opencl_mandatory_timeout=200
opencl_memory_headroom=400
opencl_memory_requirement=768
opencl_micro_nap=1000
opencl_number_event_handles=100
opencl_scheduling_profile=default
opencl_size_roundup=16
opencl_synch_cache=active module
opencl_use_cpu_devices=false
opencl_use_pinned_memory=true

Thanks for sharing… @kofa is the roundup size something worth tweaking?? Function??

Also there are some suggestions in this section (below) but no real elaboration as to whether these are suggested as performance tweaks… any idea where to find the options mentioned here and if so any reference to get more information either general or DT specific as to what each does…

From the manual…

opencl_building_gpuXXX

Manually add this parameter to darktablerc to add additional OpenCL compilation options for your GPU (s), where XXX is the name of the GPU. These options are used when compiling OpenCL kernels and can be provided for performance tuning or to work around bugs. You must remove all existing kernels in order to recompile them with the new options. Provide an empty string to recompile without any options. Remove the parameter entirely to recompile with the default options.

You can reference your GPU by its ID (for example opencl_building_gpu0 ) or by its canonical name (for example opencl_building_gpugeforce10606gb ). Start darktable with it darktable -d opencl to find your canonical GPU name and default compilation options.

For example, the following lines would add additional compilation options for the GPU with ID 0 and for the GPU named “geforce10606gb”:

opencl_building_gpu0=-cl-mad-enable -cl-no-signed-zeros -cl-unsafe-math-optimizations -cl-finite-math-only -cl-fast-relaxed-math
opencl_building_gpugeforce10606gb=-cl-mad-enable -cl-no-signed-zeros
1 Like

I think these fall into the category of “if you don’t know what they do, don’t use them”

For sure…that’s why I was trying to understand what they do to see if I would ever use them…

I’m specially attracted by the size_roundup setting you’ve mentioned, since, according to @kofa’s description (not sure where he got that from),

Please do not mistake me for some OpenCL guru.
The source of descriptions is:

LOL! I didn’t. (but you did a pretty good job on the other thread)

EDIT: Thanks for the reference

1 Like