Pro Contrast Moose Peterson

Thank you, Todd.

To those planning to experiment with Boris’ presets
it might be wise to remember that
Boris’ Family Motto reads something like
Why use only one instance of a module when you can have seventeen™?
(Which can be seen in most of his darktable tutorials.)

Have fun!
Claes in Lund, Sweden

5 Likes

So true one of my favourite videos was the one where he did a series of edits using only colorbalance…albeit “multiple” times… :slight_smile:

1 Like

AI ProContrast 2000 Platinium.dtstyle (7,2 Ko)

7 Likes

Merci, mon père!

Looks great … but I need to buy a new notebook ;-). Even a i7-7820hq with Nvidia Quadro 1200m is at its limits to deal with this style.

Is there a specific goal for each instance of diffuse…on this recent playraw which happened to be what I tested your style on, the final instance seems to generate too much artifact/grain (seen on the face)…I just took the image added exposure lens correction, profiled denoise filmic auto-selected and then your style…just curious on the though process for each instance of diffuse?? For sure you would have to tweak almost any style…

  • The first instance is for local contrast,
  • The second instance is to smoothen noise and bad things,
  • The third instance is to deblur the lens issues. Of course, this one needs adaptation depending on the glass you are using. I tweaked it for Warm colorful day in fall. If it’s too hard, reduce the number of iterations and possibly the radius span.
5 Likes

Shall I open a new thread for this?

I can confirm that exporting is extremely slow when using this preset (applied to Warm colorful day in fall), due to OpenCL being disabled. I have 6 GB of RAM on the NVidia 1060 card, and 64 GB RAM in the machine; the CPU is an AMD Ryzen 5 5600X. Note the [opencl_diffuse] couldn't enqueue kernel! -4 messages below.

176.436056 [export] creating pixelpipe took 0.053 secs (0.159 CPU)
176.436073 [pixelpipe_process] [export] using device 0
176.436095 [dev_pixelpipe] took 0.000 secs (0.000 CPU) initing base buffer [export]
176.446887 [dev_pixelpipe] took 0.011 secs (0.007 CPU) processed `raw black/white point' on GPU, blended on GPU [export]
176.451013 [dev_pixelpipe] took 0.004 secs (0.004 CPU) processed `white balance' on GPU, blended on GPU [export]
176.458158 [dev_pixelpipe] took 0.007 secs (0.003 CPU) processed `highlight reconstruction' on GPU, blended on GPU [export]
176.546673 [dev_pixelpipe] took 0.089 secs (0.048 CPU) processed `demosaic' on GPU, blended on GPU [export]
176.560071 [dev_pixelpipe] took 0.013 secs (0.009 CPU) processed `lens correction' on GPU, blended on GPU [export]
176.576557 [dev_pixelpipe] took 0.016 secs (0.013 CPU) processed `exposure' on GPU, blended on GPU [export]
176.862751 [dev_pixelpipe] took 0.286 secs (1.330 CPU) processed `tone equalizer' on CPU, blended on CPU [export]
176.938311 [dev_pixelpipe] took 0.076 secs (0.764 CPU) processed `tone equalizer 1' on CPU, blended on CPU [export]
176.995765 [dev_pixelpipe] took 0.057 secs (0.057 CPU) processed `input color profile' on GPU, blended on GPU [export]
image colorspace transform Lab-->RGB took 0.015 secs (0.008 GPU) [channelmixerrgb ]
177.041929 [dev_pixelpipe] took 0.046 secs (0.027 CPU) processed `color calibration' on GPU, blended on GPU [export]
177.087537 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
177.087542 [default_process_tiling_cl_ptp] (3 x 1) tiles with max dimensions 4648 x 4924 and overlap 1024
177.087544 [default_process_tiling_cl_ptp] tile (0, 0) with 4648 x 4924 at origin [0, 0]
187.978417 [opencl_diffuse] couldn't enqueue kernel! -4
187.982514 [default_process_tiling_opencl_ptp] couldn't run process_cl() for module 'diffuse' in tiling mode: 0
187.982519 [opencl_pixelpipe] could not run module 'diffuse' on gpu. falling back to cpu path
283.749282 [dev_pixelpipe] took 106.707 secs (1127.128 CPU) processed `diffuse or sharpen' on CPU, blended on CPU [export]
283.749305 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
283.749308 [default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 7368 x 4924 and overlap 16
283.749309 [default_process_tiling_cl_ptp] tile (0, 0) with 7368 x 4924 at origin [0, 0]
291.479139 [opencl_diffuse] couldn't enqueue kernel! -4
291.483153 [default_process_tiling_opencl_ptp] couldn't run process_cl() for module 'diffuse' in tiling mode: 0
291.483158 [opencl_pixelpipe] could not run module 'diffuse' on gpu. falling back to cpu path
330.331615 [dev_pixelpipe] took 46.582 secs (446.771 CPU) processed `diffuse or sharpen 1' on CPU, blended on CPU [export]
330.331641 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
330.331644 [default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 6168 x 4924 and overlap 64
330.331646 [default_process_tiling_cl_ptp] tile (0, 0) with 6168 x 4924 at origin [0, 0]
350.124052 [opencl_diffuse] couldn't enqueue kernel! -4
350.128109 [default_process_tiling_opencl_ptp] couldn't run process_cl() for module 'diffuse' in tiling mode: 0
350.128115 [opencl_pixelpipe] could not run module 'diffuse' on gpu. falling back to cpu path
467.724100 [dev_pixelpipe] took 137.392 secs (1337.772 CPU) processed `diffuse or sharpen 2' on CPU, blended on CPU [export]
467.802900 [dev_pixelpipe] took 0.079 secs (0.065 CPU) processed `color balance rgb' on GPU, blended on GPU [export]
467.833439 [dev_pixelpipe] took 0.031 secs (0.014 CPU) processed `filmic rgb' on GPU, blended on GPU [export]
image colorspace transform RGB-->Lab took 0.016 secs (0.008 GPU) [colorout ]
467.892337 [dev_pixelpipe] took 0.059 secs (0.035 CPU) processed `output color profile' on GPU, blended on GPU [export]
468.245263 [dev_pixelpipe] took 0.353 secs (0.353 CPU) processed `dithering' on CPU, blended on CPU [export]
468.342838 [dev_pixelpipe] took 0.098 secs (1.093 CPU) processed `display encoding' on CPU, blended on CPU [export]
468.342945 [opencl_profiling] profiling device 0 ('NVIDIA GeForce GTX 1060 6GB'):
468.342948 [opencl_profiling] spent  0.2184 seconds in [Write Image (from host to device)]
468.342950 [opencl_profiling] spent  0.0019 seconds in rawprepare_1f
468.342953 [opencl_profiling] spent  0.0021 seconds in whitebalance_1f
468.342955 [opencl_profiling] spent  0.0036 seconds in highlights_1f_lch_bayer
468.342957 [opencl_profiling] spent  0.0009 seconds in border_interpolate
468.342958 [opencl_profiling] spent  0.0038 seconds in rcd_border_green
468.342960 [opencl_profiling] spent  0.0053 seconds in rcd_border_redblue
468.342961 [opencl_profiling] spent  0.0049 seconds in rcd_populate
468.342963 [opencl_profiling] spent  0.0038 seconds in rcd_step_1_1
468.342965 [opencl_profiling] spent  0.0029 seconds in rcd_step_1_2
468.342966 [opencl_profiling] spent  0.0018 seconds in rcd_step_2_1
468.342968 [opencl_profiling] spent  0.0050 seconds in rcd_step_3_1
468.342969 [opencl_profiling] spent  0.0027 seconds in rcd_step_4_1
468.342970 [opencl_profiling] spent  0.0015 seconds in rcd_step_4_2
468.342972 [opencl_profiling] spent  0.0043 seconds in rcd_step_5_1
468.342973 [opencl_profiling] spent  0.0070 seconds in rcd_step_5_2
468.342974 [opencl_profiling] spent  0.0070 seconds in rcd_write_output
468.342976 [opencl_profiling] spent  0.0309 seconds in [Copy Image (on device)]
468.342978 [opencl_profiling] spent  0.0117 seconds in exposure
468.342979 [opencl_profiling] spent  0.2296 seconds in [Read Image (from device to host)]
468.342981 [opencl_profiling] spent  0.0094 seconds in colorin_unbound
468.342982 [opencl_profiling] spent  0.0085 seconds in colorspaces_transform_lab_to_rgb_matrix
468.342983 [opencl_profiling] spent  0.0108 seconds in channelmixerrgb_CAT16
468.342985 [opencl_profiling] spent 18.1803 seconds in diffuse_blur_bspline
468.342986 [opencl_profiling] spent 19.9586 seconds in diffuse_pde
468.342988 [opencl_profiling] spent  0.0177 seconds in colorbalancergb
468.342990 [opencl_profiling] spent  0.0064 seconds in filmic_mask_clipped_pixels
468.342991 [opencl_profiling] spent  0.0088 seconds in filmicrgb_chroma
468.342993 [opencl_profiling] spent  0.0087 seconds in colorspaces_transform_rgb_matrix_to_lab
468.342995 [opencl_profiling] spent  0.0166 seconds in colorout
468.342997 [opencl_profiling] spent 38.7750 seconds totally in command queue (with 3 events missing)
468.343011 [dev_process_export] pixel pipeline processing took 291.907 secs (2915.494 CPU)

Card info:

0.051513 [opencl_init] device 0 `NVIDIA GeForce GTX 1060 6GB' has sm_20 support.
0.051716 [opencl_init] device 0 `NVIDIA GeForce GTX 1060 6GB' supports image sizes of 16384 x 32768
0.051720 [opencl_init] device 0 `NVIDIA GeForce GTX 1060 6GB' allows GPU memory allocations of up to 1519MB
[opencl_init] device 0: NVIDIA GeForce GTX 1060 6GB 
     CANONICAL_NAME:           nvidiag
     GLOBAL_MEM_SIZE:          6077MB
     MAX_WORK_GROUP_SIZE:      1024
     MAX_WORK_ITEM_DIMENSIONS: 3
     MAX_WORK_ITEM_SIZES:      [ 1024 1024 64 ]
     DRIVER_VERSION:           470.82.00
     DEVICE_VERSION:           OpenCL 3.0 CUDA

According to nvida-smi, with the browser open and darktable idling in the background, I have 465MiB in use of 6077MiB.

Is there some setting I could change to allow the operation to run on the GPU? In the manual, the only setting I found mentioning OpenCL and tiling is only concerned with the speed of memory copies: opencl_use_pinned_memory, but nothing about OpenCL and tiling memory sizes. Starting with an empty darktable config did not help.

more than 2 minutes, ouch…

Yep, ouch. The steps of same style, applied to a 16 MPixel Nikon D7000 image:

82.015155 [dev_pixelpipe] took 8.002 secs (8.036 CPU) processed `diffuse or sharpen' on GPU, blended on GPU [export]
84.511067 [dev_pixelpipe] took 2.496 secs (2.439 CPU) processed `diffuse or sharpen 1' on GPU, blended on GPU [export]
93.126881 [dev_pixelpipe] took 8.616 secs (8.136 CPU) processed `diffuse or sharpen 2' on GPU, blended on GPU [export]

@kofa
How did you arrive at those clockings?
My machine if just a little bit better than yours,
and that same style took

  • 21 seconds with openCL
  • 33 seconds without openCL

when I transformed a RAF into a JPG.

Have fun!
Claes in Lund, Sweden

It’s not a bug, it’s hardware. Your GPU doesn’t have enough memory available.

CL_MEM_OBJECT_ALLOCATION_FAILURE -4

more than 2 minutes, ouch…

Still 1/8th of a blind deconvolution runtime.

Have you tried using the image from Warm colorful day in fall ?

@anon41087856 - The card has 6gb, but

allows GPU memory allocations of up to 1519MB

That could be a reason.

It’s weird because I have even less than that and it works perfectly:

0.811893 [opencl_init] device 0 `Quadro M2200' has sm_20 support.
0.812055 [opencl_init] device 0 `Quadro M2200' supports image sizes of 16384 x 16384
0.812060 [opencl_init] device 0 `Quadro M2200' allows GPU memory allocations of up to 1010MB
[opencl_init] device 0: Quadro M2200 
     CANONICAL_NAME:           quadrom
     GLOBAL_MEM_SIZE:          4044MB
     MAX_WORK_GROUP_SIZE:      1024
     MAX_WORK_ITEM_DIMENSIONS: 3
     MAX_WORK_ITEM_SIZES:      [ 1024 1024 64 ]
     DRIVER_VERSION:           470.74
     DEVICE_VERSION:           OpenCL 3.0 CUDA
0.856024 [opencl_init] options for OpenCL compiler: -w -cl-fast-relaxed-math  -DNVIDIA_SM_20=1 -DNVIDIA=1 -I"/opt/darktable/share/darktable/kernels"
63,301325 [dev] took 0,000 secs (0,000 CPU) to load the image.
63,413158 [export] creating pixelpipe took 0,099 secs (0,250 CPU)
63,413187 [pixelpipe_process] [export] using device 0
63,413224 [dev_pixelpipe] took 0,000 secs (0,000 CPU) initing base buffer [export]
63,427966 [dev_pixelpipe] took 0,015 secs (0,013 CPU) processed `point noir/blanc raw' on GPU, blended on GPU [export]
63,439290 [dev_pixelpipe] took 0,011 secs (0,006 CPU) processed `balance des blancs' on GPU, blended on GPU [export]
63,657938 [dev_pixelpipe] took 0,219 secs (0,115 CPU) processed `dématriçage' on GPU, blended on GPU [export]
63,689078 [dev_pixelpipe] took 0,031 secs (0,018 CPU) processed `correction des objectifs' on GPU, blended on GPU [export]
63,728183 [dev_pixelpipe] took 0,039 secs (0,026 CPU) processed `exposition' on GPU, blended on GPU [export]
64,456505 [dev_pixelpipe] took 0,728 secs (2,702 CPU) processed `égaliseur de ton' on CPU, blended on CPU [export]
64,621579 [dev_pixelpipe] took 0,165 secs (1,181 CPU) processed `égaliseur de ton 1' on CPU, blended on CPU [export]
64,713158 [dev_pixelpipe] took 0,092 secs (0,088 CPU) processed `profil de couleur d'entrée' on GPU, blended on GPU [export]
image colorspace transform Lab-->RGB took 0,029 secs (0,018 GPU) [channelmixerrgb ]
64,819916 [dev_pixelpipe] took 0,107 secs (0,072 CPU) processed `calibration des couleurs' on GPU, blended on GPU [export]
64,884695 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
64,884706 [default_process_tiling_cl_ptp] (5 x 3) tiles with max dimensions 3832 x 3833 and overlap 1024
64,884714 [default_process_tiling_cl_ptp] tile (0, 0) with 3832 x 3833 at origin [0, 0]
83,859892 [default_process_tiling_cl_ptp] tile (0, 1) with 3832 x 3139 at origin [0, 1785]
99,507791 [default_process_tiling_cl_ptp] tile (1, 0) with 3832 x 3833 at origin [1784, 0]
118,548666 [default_process_tiling_cl_ptp] tile (1, 1) with 3832 x 3139 at origin [1784, 1785]
134,191398 [default_process_tiling_cl_ptp] tile (2, 0) with 3806 x 3833 at origin [3568, 0]
153,464296 [default_process_tiling_cl_ptp] tile (2, 1) with 3806 x 3139 at origin [3568, 1785]
169,288535 [dev_pixelpipe] took 104,469 secs (104,629 CPU) processed `diffusion ou netteté' on GPU with tiling, blended on CPU [export]
169,288744 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
169,288766 [default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 4728 x 4924 and overlap 16
169,288768 [default_process_tiling_cl_ptp] tile (0, 0) with 4728 x 4924 at origin [0, 0]
180,851588 [default_process_tiling_cl_ptp] tile (1, 0) with 2678 x 4924 at origin [4696, 0]
185,238803 [dev_pixelpipe] took 15,950 secs (15,894 CPU) processed `diffusion ou netteté 1' on GPU with tiling, blended on CPU [export]
185,238982 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
185,238986 [default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 3956 x 4924 and overlap 64
185,238989 [default_process_tiling_cl_ptp] tile (0, 0) with 3956 x 4924 at origin [0, 0]
212,373233 [default_process_tiling_cl_ptp] tile (1, 0) with 3546 x 4924 at origin [3828, 0]
235,852099 [dev_pixelpipe] took 50,613 secs (48,805 CPU) processed `diffusion ou netteté 2' on GPU with tiling, blended on CPU [export]
236,136452 [dev_pixelpipe] took 0,284 secs (0,274 CPU) processed `balance couleur rvb' on GPU, blended on GPU [export]
236,206395 [default_process_tiling_cl_ptp] use tiling on module 'filmicrgb' for image with full size 7374 x 4924
236,206407 [default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 5388 x 4924 and overlap 512
236,206409 [default_process_tiling_cl_ptp] tile (0, 0) with 5388 x 4924 at origin [0, 0]
236,382849 [default_process_tiling_cl_ptp] tile (1, 0) with 3010 x 4924 at origin [4364, 0]
236,519600 [dev_pixelpipe] took 0,383 secs (0,333 CPU) processed `filmique rvb' on GPU with tiling, blended on CPU [export]
image colorspace transform RGB-->Lab took 0,018 secs (0,016 GPU) [colorout ]
236,749322 [dev_pixelpipe] took 0,230 secs (0,197 CPU) processed `profil de couleur de sortie' on GPU, blended on GPU [export]
237,256079 [dev_pixelpipe] took 0,507 secs (0,504 CPU) processed `homogénéisation' on CPU, blended on CPU [export]
237,391051 [dev_pixelpipe] took 0,135 secs (0,637 CPU) processed `encodage écran' on CPU, blended on CPU [export]
237,391441 [opencl_profiling] profiling device 0 ('Quadro M2200'):
237,391446 [opencl_profiling] spent  2,3921 seconds in [Write Image (from host to device)]
237,391449 [opencl_profiling] spent  0,0042 seconds in rawprepare_1f
237,391451 [opencl_profiling] spent  0,0045 seconds in whitebalance_1f
237,391453 [opencl_profiling] spent  0,0022 seconds in border_interpolate
237,391455 [opencl_profiling] spent  0,0085 seconds in rcd_border_green
237,391457 [opencl_profiling] spent  0,0108 seconds in rcd_border_redblue
237,391459 [opencl_profiling] spent  0,0097 seconds in rcd_populate
237,391461 [opencl_profiling] spent  0,0123 seconds in rcd_step_1_1
237,391463 [opencl_profiling] spent  0,0065 seconds in rcd_step_1_2
237,391465 [opencl_profiling] spent  0,0068 seconds in rcd_step_2_1
237,391467 [opencl_profiling] spent  0,0183 seconds in rcd_step_3_1
237,391469 [opencl_profiling] spent  0,0103 seconds in rcd_step_4_1
237,391471 [opencl_profiling] spent  0,0029 seconds in rcd_step_4_2
237,391472 [opencl_profiling] spent  0,0147 seconds in rcd_step_5_1
237,391474 [opencl_profiling] spent  0,0235 seconds in rcd_step_5_2
237,391476 [opencl_profiling] spent  0,0149 seconds in rcd_write_output
237,391478 [opencl_profiling] spent  0,0521 seconds in [Copy Image (on device)]
237,391482 [opencl_profiling] spent  0,0256 seconds in exposure
237,391485 [opencl_profiling] spent  0,7708 seconds in [Read Image (from device to host)]
237,391487 [opencl_profiling] spent  0,0208 seconds in colorin_unbound
237,391488 [opencl_profiling] spent  0,0238 seconds in colorspaces_transform_lab_to_rgb_matrix
237,391491 [opencl_profiling] spent  0,0268 seconds in channelmixerrgb_CAT16
237,391493 [opencl_profiling] spent 98,7387 seconds in diffuse_blur_bspline
237,391496 [opencl_profiling] spent 69,2394 seconds in diffuse_pde
237,391499 [opencl_profiling] spent  0,0362 seconds in colorbalancergb
237,391501 [opencl_profiling] spent  0,0114 seconds in filmic_mask_clipped_pixels
237,391504 [opencl_profiling] spent  0,0188 seconds in filmicrgb_chroma
237,391506 [opencl_profiling] spent  0,0202 seconds in colorspaces_transform_rgb_matrix_to_lab
237,391509 [opencl_profiling] spent  0,0600 seconds in colorout
237,391511 [opencl_profiling] spent 171,5868 seconds totally in command queue (with 0 events missing)
237,391536 [dev_process_export] pixel pipeline processing took 173,978 secs (175,495 CPU)

It’s slow as hell too, but this is export so it hardly matters (you are not required in front of the computer while exporting).

I can reproduce the problem with the same image and style, with my 8GB GTX 1080. All of the memory allocation in process_cl succeeds, but then wavelets_process_cl fails with the -4 error. It seems like maybe the GPU needs some amount of free memory to run the kernel and when the tile size is chosen to be as large as possible it doesn’t leave enough free. Adding an additional 2 to tiling->factor_cl in tiling_callback fixes the problem on my system.

There is also an option opencl_memory_headroom in darktablerc. Increasing that from the default 400 to 800 also fixes the problem, but if the amount of extra memory it needs is proportional to the image / tile size then it probably would be better to change tiling->factor_cl than to fix it that way.

Edit: same issue came up in this thread, opencl_memory_headroom was the solution.

2 Likes

Just did. 90 seconds without openCL, and 90 seconds with open CL :frowning:
(because it could not run on gpu, falling back to cpu path…)

I will have to try with @paolod’s suggestions.

Have fun!
Claes in Lund, Sweden

Hah, it’s nice to find the solution I posted myself earlier. :slight_smile: I’ve just never considered that with 6 GB on the card I need to worry about the headroom. Anyway, with the headroom set to 800 MB, tiling succeeds and I get:

50.629860 [pixelpipe_process] [export] using device 0
...
51.322878 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
51.322883 [default_process_tiling_cl_ptp] (4 x 1) tiles with max dimensions 4320 x 4924 and overlap 1024
51.322884 [default_process_tiling_cl_ptp] tile (0, 0) with 4320 x 4924 at origin [0, 0]
60.762710 [default_process_tiling_cl_ptp] tile (1, 0) with 4320 x 4924 at origin [2272, 0]
70.206403 [default_process_tiling_cl_ptp] tile (2, 0) with 2830 x 4924 at origin [4544, 0]
74.961507 [dev_pixelpipe] took 23.684 secs (23.518 CPU) processed `diffuse or sharpen' on GPU with tiling, blended on CPU [export]
74.961526 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
74.961528 [default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 6852 x 4924 and overlap 16
74.961530 [default_process_tiling_cl_ptp] tile (0, 0) with 6852 x 4924 at origin [0, 0]
81.641600 [default_process_tiling_cl_ptp] tile (1, 0) with 554 x 4924 at origin [6820, 0]
82.020020 [dev_pixelpipe] took 7.059 secs (7.005 CPU) processed `diffuse or sharpen 1' on GPU with tiling, blended on CPU [export]
82.020039 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
82.020042 [default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 5732 x 4924 and overlap 64
82.020044 [default_process_tiling_cl_ptp] tile (0, 0) with 5732 x 4924 at origin [0, 0]
97.672605 [default_process_tiling_cl_ptp] tile (1, 0) with 1770 x 4924 at origin [5604, 0]
100.578624 [dev_pixelpipe] took 18.559 secs (17.528 CPU) processed `diffuse or sharpen 2' on GPU with tiling, blended on CPU [export]
...
101.161714 [opencl_profiling] spent 22.5110 seconds in diffuse_blur_bspline
101.161716 [opencl_profiling] spent 25.9366 seconds in diffuse_pde
...
101.161727 [opencl_profiling] spent 49.4508 seconds totally in command queue (with 0 events missing)
101.161746 [dev_process_export] pixel pipeline processing took 50.532 secs (51.887 CPU)

@Claes : you say your machine is only ‘a little bit better’ than mine; but on the CPU path (OpenCL disabled) I got pixel pipeline processing took 275.756 secs (3153.805 CPU) (all 12 ‘hyperthreaded’ cores in use, CPU running at around 4.2 GHz), while you get 90 seconds… that’s hardly ‘a little bit’. (This is with ND800_0005626_anonymized.NEF, with just the default settings and Aurélien’s style.)

I’ve now recompiled with --build-type Release (I had not specified a build-type previously, so it used the default RelWithDebugInfo), and now got much better timings:

73.566043 [dev_pixelpipe] took 44.513 secs (495.844 CPU) processed `diffuse or sharpen' on CPU, blended on CPU [export]
89.567931 [dev_pixelpipe] took 16.002 secs (180.493 CPU) processed `diffuse or sharpen 1' on CPU, blended on CPU [export]
138.042138 [dev_pixelpipe] took 48.474 secs (551.746 CPU) processed `diffuse or sharpen 2' on CPU, blended on CPU [export]
...
140.268511 [dev_process_export] pixel pipeline processing took 112.221 secs (1261.094 CPU)
1 Like

The headroom still matters even with a large amount of GPU memory, because of tiling. When there isn’t enough memory for the module to process the entire image at once, it breaks it up into tiles that are processed one at a a time. To minimize the number of separate tiles that have to be processed, it tries to make them as large as possible. The largest tile size is the size that fills up almost all of the available GPU memory, leaving only the specified headroom free.

2 Likes