Hah, it’s nice to find the solution I posted myself earlier. I’ve just never considered that with 6 GB on the card I need to worry about the headroom. Anyway, with the headroom set to 800 MB, tiling succeeds and I get:
50.629860 [pixelpipe_process] [export] using device 0
...
51.322878 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
51.322883 [default_process_tiling_cl_ptp] (4 x 1) tiles with max dimensions 4320 x 4924 and overlap 1024
51.322884 [default_process_tiling_cl_ptp] tile (0, 0) with 4320 x 4924 at origin [0, 0]
60.762710 [default_process_tiling_cl_ptp] tile (1, 0) with 4320 x 4924 at origin [2272, 0]
70.206403 [default_process_tiling_cl_ptp] tile (2, 0) with 2830 x 4924 at origin [4544, 0]
74.961507 [dev_pixelpipe] took 23.684 secs (23.518 CPU) processed `diffuse or sharpen' on GPU with tiling, blended on CPU [export]
74.961526 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
74.961528 [default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 6852 x 4924 and overlap 16
74.961530 [default_process_tiling_cl_ptp] tile (0, 0) with 6852 x 4924 at origin [0, 0]
81.641600 [default_process_tiling_cl_ptp] tile (1, 0) with 554 x 4924 at origin [6820, 0]
82.020020 [dev_pixelpipe] took 7.059 secs (7.005 CPU) processed `diffuse or sharpen 1' on GPU with tiling, blended on CPU [export]
82.020039 [default_process_tiling_cl_ptp] use tiling on module 'diffuse' for image with full size 7374 x 4924
82.020042 [default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 5732 x 4924 and overlap 64
82.020044 [default_process_tiling_cl_ptp] tile (0, 0) with 5732 x 4924 at origin [0, 0]
97.672605 [default_process_tiling_cl_ptp] tile (1, 0) with 1770 x 4924 at origin [5604, 0]
100.578624 [dev_pixelpipe] took 18.559 secs (17.528 CPU) processed `diffuse or sharpen 2' on GPU with tiling, blended on CPU [export]
...
101.161714 [opencl_profiling] spent 22.5110 seconds in diffuse_blur_bspline
101.161716 [opencl_profiling] spent 25.9366 seconds in diffuse_pde
...
101.161727 [opencl_profiling] spent 49.4508 seconds totally in command queue (with 0 events missing)
101.161746 [dev_process_export] pixel pipeline processing took 50.532 secs (51.887 CPU)
@Claes : you say your machine is only ‘a little bit better’ than mine; but on the CPU path (OpenCL disabled) I got pixel pipeline processing took 275.756 secs (3153.805 CPU)
(all 12 ‘hyperthreaded’ cores in use, CPU running at around 4.2 GHz), while you get 90 seconds… that’s hardly ‘a little bit’. (This is with ND800_0005626_anonymized.NEF
, with just the default settings and Aurélien’s style.)
I’ve now recompiled with --build-type Release
(I had not specified a build-type previously, so it used the default RelWithDebugInfo), and now got much better timings:
73.566043 [dev_pixelpipe] took 44.513 secs (495.844 CPU) processed `diffuse or sharpen' on CPU, blended on CPU [export]
89.567931 [dev_pixelpipe] took 16.002 secs (180.493 CPU) processed `diffuse or sharpen 1' on CPU, blended on CPU [export]
138.042138 [dev_pixelpipe] took 48.474 secs (551.746 CPU) processed `diffuse or sharpen 2' on CPU, blended on CPU [export]
...
140.268511 [dev_process_export] pixel pipeline processing took 112.221 secs (1261.094 CPU)