How to understand errors: module atrous, OpenCl, tiling etc.

When exporting an image I got the messages seen below. Can anyone explain how to interpret this and tell me what should/could be done?

My laptop has the following specs:

Intel® Core™ i7-4510U CPU @ 2.00GHz 2.60 GHz with 8,00 GB RAM

4 GB NVIDIA GeForce GTX 850M

Default darktable settings (host memory limit = 1500):

78,257671 [default_process_tiling_cl_ptp] use tiling on module ‘atrous’ for image with full size 6034 x 4017

78,257671 [default_process_tiling_cl_ptp] (2 x 1) tiles with max dimensions 5480 x 4017 and overlap 512

78,257671 [default_process_tiling_cl_ptp] tile (0, 0) with 5480 x 4017 at origin [0, 0]

80,178557 [opencl_atrous] couldn’t enqueue kernel! -4

80,209799 [default_process_tiling_opencl_ptp] couldn’t run process_cl() for module ‘atrous’ in tiling mode: 0

80,209799 [opencl_pixelpipe] could not run module ‘atrous’ on gpu. falling back to cpu path

117,281491 [dev_pixelpipe] took 39,399 secs (139,109 CPU) processed `contrast equalizer’ on CPU with tiling, blended on CPU [export]

I tried some other settings for host memory limit and got the following results. The messages and results are all the same as above except for the message shown here:

Host memory limit = 2500

107,247154 [dev_pixelpipe] took 20,405 secs (67,156 CPU) processed `contrast equalizer’ on CPU with tiling, blended on CPU [export]

Host memory limit = 0 (no limitations)

69,119066 [dev_pixelpipe] took 13,615 secs (41,984 CPU) processed `contrast equalizer’ on CPU, blended on CPU [export]

I understand that contrast equalizer can’t run on GPU/OpenCl in this case. Why and what can be done about it?

I all situations contrast equalizer ran successfully on the CPU with tiling or without tiling (host memory limit = 0). What are the possible drawbacks of running darktable with host memory limit = 0?

The contrast equalizer performance was improved from 39,4 to 13,6 sec but the overall performance was only improved from 83,6 to 55,3 sec, so it is almost only contrast equalizer that benefits from the unlimited memory setting! The contrast equalizer must be very hard on memory!

I can’t say anything about the tiling. If you can’t avoid it, you may try raising the setting ‘minimum amount of memory (in MB) for a single buffer in tiling’.
https://elstoc.github.io/dtdocs/special-topics/memory/

For the OpenCL issue:
opencl_async_pixelpipe: ‘For optimum latency set this to TRUE, so that darktable runs the pixelpipe asynchronously and tries to use as few interrupts as possible. If you experience OpenCL errors like failing kernels, set the parameter to FALSE’
and
you can try changing ‘opencl_memory_headroom’ (for me, with a 6 GB NVidia card, it’s set to 400, but in the past, with a 2 GB card, I sometimes had to raise this). And, although this is in the AMD section of the manual, it applies to other cards, too. Something to do with the drivers not being able to report how much memory is actually available for applications.
https://elstoc.github.io/dtdocs/special-topics/opencl/

Finally, I have been told that if you run low on RAM (which is sometimes the case with my 4 GB system), OpenCL will fail to initialise (completely, not while running a module). So if I see that, I close everything else (browser etc.) to free up RAM. May not be applicable to you, of course.

Maybe I should mention that I’m running windows 10.

Host memory limit is set to 1500 (default value) for the measurements in this post.

opencl_async_pixelpipe was set to false. This is apparently the default since I have not changed the config file.
Setting this parameter to true changes nothing except that the performance improved from 83,6 to 79,9 sec.

Now for your suggestions on opencl_memory_headroom (400 is the default)

opencl_memory_headroom=800 and opencl_async_pixelpipe=true
Contrast equalizer was now processed on GPU with tiling and total performance improved from 83,6 to 46,7 sec.
Contrast equalizer performance alone was improved from 39,4 sec on CPU to 4,4 sec on GPU.

Big success! But why?

Can you suggest other tunings?

As far as I know, the driver cannot report free memory, only total memory on the card (we don’t know how much the operating system uses). The rest is just conjecture:

  • darktable asks for the total memory and reads the headroom value; assumes that the difference is available
  • then, when it actually tries to use/allocate the memory, the attempt fails
  • darktable falls back to the CPU path

By setting a higher headroom, darktable tries to allocate less memory, which succeeds. You may want to experiment, maybe with a value like 600 you’d be able to avoid tiling, and speed up contrast EQ (‘atrous’) even more (but don’t expect a jump as much as from going from CPU to GPU).

Performance tuning tips: https://elstoc.github.io/dtdocs/special-topics/opencl/performance/

I tried opencl_memory_headroom=600 and 500.

600 worked ok but with tiling. 500 caused the contrast equalizer to be processed on the CPU. So tiling can’t be avoided but never mind, the process took only 4.4 sec on the GPU. There is not much room for further tuning.

Thank you for all your help and clarification :+1::+1:….