23,392663 [default_process_tiling_cl_ptp] aborted tiling for module ‘diffuse’. too many tiles: 4301 x 2867
23,392690 [opencl_pixelpipe] could not run module ‘diffuse’ on gpu. falling back to cpu path
83,435404 [dev_pixelpipe] took 60,072 secs (451,092 CPU) processed `diffuse or sharpen’ on CPU, blended on CPU [export]
[…]
86,556387 [dev_process_export] pixel pipeline processing took 65,264 secs (455,291 CPU)
with headroom = 1200
17,799217 [default_process_tiling_cl_ptp] aborted tiling for module ‘diffuse’. too many tiles: 4301 x 2867
17,799227 [opencl_pixelpipe] could not run module ‘diffuse’ on gpu. falling back to cpu path
78,031524 [dev_pixelpipe] took 60,234 secs (452,320 CPU) processed `diffuse or sharpen’ on CPU, blended on CPU [export]
[…]
80,640850 [dev_process_export] pixel pipeline processing took 65,443 secs (459,138 CPU)
so, although with both settings diffuse or sharpen falls back on being computed on the CPU, the other modules are faster, giving an overall advantage to using open with headroom = 800 or 1200
In summary:
open cl disabled: export time = 71 sec
open cl enabled, headroom = 400: engages the GPU for diffuse or sharpen: export time = TOO LONG
open cl enabled, headroom = 800 or 1200: falls back on CPU for diffuse or sharpen: export time = 65 sec
I gather that the only effect of increasing headroom here is to disable opencl for the diffuse or sharpen module, correct? Note that the integrated graphic card reports:
0.123745 [opencl_init] device 1 Intel(R) Iris™ Plus Graphics’ supports image sizes of 16384 x 16384 0.123748 [opencl_init] device 1 Intel(R) Iris™ Plus Graphics’ allows GPU memory allocations of up to 384MB [opencl_init] device 1: Intel(R) Iris™ Plus Graphics CANONICAL_NAME: intelri GLOBAL_MEM_SIZE: 1536MB
so perhaps increasing the headroom does not leave enough GPU memory for computing tiles?
The headroom tells darktable how much GPU memory it should treat as already allocated by the operating system and other programs. I think you need this because there’s no way to ask OpenCL for free (or allocated) memory. The higher the headroom, the less GPU memory darktable will try to allocate. If you set it too low, darktablev will try to allocate more than is available → failure. Too high: bad performance.
The same question came to my mind. Maybe it´s just not displayed, will try with -d all.
Antoher thing which I encounterd. I loaded the sidecar to a picture of my nikon d7100 (6000x4000) vs. gpagnon´s pic with ~ 4000x3000. The nikon pic could be exported in 160 seconds. In the same session gpagnon´s pic is still in export since 20 minutes.
At least it tries to do tiling, but on my computer without succes:
22.583654 [default_process_tiling_cl_ptp] aborted tiling for module 'diffuse'. too many tiles: 5890 x 4018
22.583677 [opencl_pixelpipe] could not run module 'diffuse' on gpu. falling back to cpu path
22.583698 [default_process_tiling_ptp] gave up tiling for module 'diffuse'. too many tiles: 5890 x 4018
I think it is simply because by the change of host memory to 0 (not limit to the available memory), the system can use and load the entire image to memory without the need to tiling. But because the openCL doesnt have enough memory, it still needs to do the tiling. It is creating so many tiles and processing all of them and storing the results to then merge them. It the end it is too much, thus failing it to CPU makes sense in your system.
Increasing the headroom forces the system to leave more GPU memory available for other tasks (per the manual), therefore when the diffuse tries to use the GPU it notices it doesnt have enough memory and changes to CPU.
I would like for someone with more knowledge around when/how DT uses the tile to chime in.
I am not sure this is my case though, as I don’t recall (I am not on that machine now) seeing messages about failed compilation of opencl kernels in the terminal.
Thanks, closed the issue. That’s what I seem to be doing these days: open a feature req, then realise it’s already done. But then why did @gpagnon have the issue? Shouldn’t PR 9764 have taken care of updating the memory limit param?
I think they bumbed DT_CURRENT_PERFORMANCE_CONFIGURE_VERSION from 1 to 2; and if darktable detects that the one in darktablerc is old (1), it prompts the user:
Interesting, because I am pretty sure that when I installed 3.8.0 dmg from darktable website, in response to that message, I consented to having my old configuration updated by the installation.