Slideshow and tiling on a Mac (diffuse and sharpen?): extremely slow

kofa · January 17, 2022, 8:19pm

It’s still weird, because with 1.5 GB available, I see no reason why a 12 MPixel image would be split into 1500 pieces. However, with your parameters (host_memory_limit=1500, singlebuffer_limit=16) I get even worse performance: 1944 tiles, of size 80x80 pixels:

26.238423 [default_process_tiling_ptp] use tiling on module 'diffuse' for image with full size 4301 x 2867
26.238427 [default_process_tiling_ptp] (54 x 36) tiles with max dimensions 2128 x 2128 and overlap 1024
26.238448 [default_process_tiling_ptp] tile (0, 0) with 2128 x 2128 at origin [0, 0]
28.094228 [lighttable] expose took 0.0000 sec
29.987606 [default_process_tiling_ptp] tile (0, 1) with 2128 x 2128 at origin [0, 80]

So, resolved. But I’ll open a feature request to increase the defaults.

kofa · January 17, 2022, 8:31pm

No, tiling is not ‘an opencl thing’.
See darktable 3.8 user manual - memory

gpagnon · January 17, 2022, 8:44pm

Here are my results

with headroom = 800

23,392663 [default_process_tiling_cl_ptp] aborted tiling for module ‘diffuse’. too many tiles: 4301 x 2867
23,392690 [opencl_pixelpipe] could not run module ‘diffuse’ on gpu. falling back to cpu path
83,435404 [dev_pixelpipe] took 60,072 secs (451,092 CPU) processed `diffuse or sharpen’ on CPU, blended on CPU [export]
[…]
86,556387 [dev_process_export] pixel pipeline processing took 65,264 secs (455,291 CPU)

with headroom = 1200

17,799217 [default_process_tiling_cl_ptp] aborted tiling for module ‘diffuse’. too many tiles: 4301 x 2867
17,799227 [opencl_pixelpipe] could not run module ‘diffuse’ on gpu. falling back to cpu path
78,031524 [dev_pixelpipe] took 60,234 secs (452,320 CPU) processed `diffuse or sharpen’ on CPU, blended on CPU [export]
[…]
80,640850 [dev_process_export] pixel pipeline processing took 65,443 secs (459,138 CPU)

so, although with both settings diffuse or sharpen falls back on being computed on the CPU, the other modules are faster, giving an overall advantage to using open with headroom = 800 or 1200

In summary:

open cl disabled: export time = 71 sec
open cl enabled, headroom = 400: engages the GPU for diffuse or sharpen: export time = TOO LONG
open cl enabled, headroom = 800 or 1200: falls back on CPU for diffuse or sharpen: export time = 65 sec

I gather that the only effect of increasing headroom here is to disable opencl for the diffuse or sharpen module, correct? Note that the integrated graphic card reports:

0.123745 [opencl_init] device 1 Intel(R) Iris™ Plus Graphics’ supports image sizes of 16384 x 16384 0.123748 [opencl_init] device 1 Intel(R) Iris™ Plus Graphics’ allows GPU memory allocations of up to 384MB [opencl_init] device 1: Intel(R) Iris™ Plus Graphics CANONICAL_NAME: intelri GLOBAL_MEM_SIZE: 1536MB

so perhaps increasing the headroom does not leave enough GPU memory for computing tiles?

gpagnon · January 17, 2022, 8:45pm

How is it then that the terminal does not give any information about tiling of diffuse or sharpen when exporting with opencl disabled?

kofa · January 17, 2022, 9:19pm

The headroom tells darktable how much GPU memory it should treat as already allocated by the operating system and other programs. I think you need this because there’s no way to ask OpenCL for free (or allocated) memory. The higher the headroom, the less GPU memory darktable will try to allocate. If you set it too low, darktablev will try to allocate more than is available → failure. Too high: bad performance.

semperit · January 17, 2022, 9:20pm

The same question came to my mind. Maybe it´s just not displayed, will try with -d all.

Antoher thing which I encounterd. I loaded the sidecar to a picture of my nikon d7100 (6000x4000) vs. gpagnon´s pic with ~ 4000x3000. The nikon pic could be exported in 160 seconds. In the same session gpagnon´s pic is still in export since 20 minutes.

kofa · January 17, 2022, 9:22pm

Different demosaic. Perhaps the X-Trans algorithm is leaking memory, or simply needs more?

gpagnon · January 17, 2022, 9:28pm

Not x-trans though. This was the first fuji x100 with Bayer sensor (I miss those colors!)

semperit · January 17, 2022, 9:30pm

At least it tries to do tiling, but on my computer without succes:

22.583654 [default_process_tiling_cl_ptp] aborted tiling for module 'diffuse'. too many tiles: 5890 x 4018
22.583677 [opencl_pixelpipe] could not run module 'diffuse' on gpu. falling back to cpu path
22.583698 [default_process_tiling_ptp] gave up tiling for module 'diffuse'. too many tiles: 5890 x 4018

kofa · January 17, 2022, 9:30pm

Oh, sorry, then bad guess/memory. I’ve already shut down the computers.

g-man · January 17, 2022, 9:45pm

I think it is simply because by the change of host memory to 0 (not limit to the available memory), the system can use and load the entire image to memory without the need to tiling. But because the openCL doesnt have enough memory, it still needs to do the tiling. It is creating so many tiles and processing all of them and storing the results to then merge them. It the end it is too much, thus failing it to CPU makes sense in your system.

Increasing the headroom forces the system to leave more GPU memory available for other tasks (per the manual), therefore when the diffuse tries to use the GPU it notices it doesnt have enough memory and changes to CPU.

I would like for someone with more knowledge around when/how DT uses the tile to chime in.

MStraeten · January 18, 2022, 6:15am

in some cases the compilation of kernels needs several attempts. So try this:

github.com/darktable-org/darktable

OSX: compiling kernels for opencl for packaged darktable.app needs several attempts

opened 08:52PM - 23 Jul 19 UTC

MStraeten

no-issue-activity scope: macos support

**Describe the bug** on the first start of packaged darktable.app it needs plen…ty of attempts until all kernels are compiled. (last try >40 attempts) error example: ``` 5.144612 [opencl_load_program] could not load cached binary program, trying to compile source 5.144626 [opencl_load_program] successfully loaded program from `/Applications/darktable.app/Contents/Resources/share/darktable/kernels/highpass.cl' 5.166519 [opencl_build_program] could not build program: -11 5.166536 [opencl_build_program] BUILD STATUS: -2 5.166550 BUILD LOG: 5.166551 <program source>:19:10: fatal error: 'common.h' file not found #include "common.h" ^ 5.166555 [opencl_init] failed to compile program `highpass.cl'! 5.166572 [opencl_init] FINALLY: opencl is NOT AVAILABLE on this system. 5.166574 [opencl_init] initial status of opencl enabled flag is OFF. ``` it needs several attempts until the compiler realizes, that common.h is really existing ... **To Reproduce** Steps to reproduce the behavior: build darktable / make osx package / install darktable.app from package like described in BUILD.txt 1. delete ~/.cache/darktable/cached_kernels* 2. start /Applications/darktable.app/Contents/MacOS/darktable -d opencl from terminal 3. see error, but darktable starts without opencl 4. be patient and restart several times until darktable compiled all kernels and enables opencl as long as darktable further is started from darktable.app it keeps opencl enabled **Expected behavior** all kernels compiles on the first attempt. if darktable ist started from /usr/local/bin via GSETTINGS_SCHEMA_DIR=/opt/local/share/glib-2.0/schemas/ XDG_DATA_DIRS=/opt/local/share darktable all kernels compile swithout problems **Screenshots** **Platform (please complete the following information):** - OS: OSX darktable 2.7.0+1613~g055479ffa copyright (c) 2009-2019 johannes hanika darktable-dev@lists.darktable.org compile options: bit depth is 64 bit normal build SSE2 optimized codepath enabled OpenMP support enabled OpenCL support enabled Lua support enabled, API version 5.0.2 Colord support disabled gPhoto2 support enabled GraphicsMagick support enabled OpenEXR support enabled

semperit · January 18, 2022, 8:16am

Thanks Martin, I know the linked issue and have a loop running to enable open CL support.

I already raised an issue here: https://github.com/darktable-org/darktable/issues/9572

But wasn’t able to solve the problem.

kofa · January 18, 2022, 11:07am

Opened https://github.com/darktable-org/darktable/issues/10910 to track the tiling issue (caused by too low default value for host_memory_limit).

gpagnon · January 18, 2022, 11:58am

Thank you!

gpagnon · January 18, 2022, 12:00pm

I am not sure this is my case though, as I don’t recall (I am not on that machine now) seeing messages about failed compilation of opencl kernels in the terminal.

g-man · January 18, 2022, 12:11pm

Yesterday I started to explore the subject and found some discussion. https://github.com/darktable-org/darktable/issues/10884

The pull request from johnny-bit was merged in August and should address this. https://github.com/darktable-org/darktable/pull/9764

I think as more user start to use the diffuse module, the need for available memory is increasing.

kofa · January 18, 2022, 12:25pm

Thanks, closed the issue. That’s what I seem to be doing these days: open a feature req, then realise it’s already done. But then why did @gpagnon have the issue? Shouldn’t PR 9764 have taken care of updating the memory limit param?

g-man · January 18, 2022, 12:28pm

I’m still reading the code changes from that pull request. It seems that the new performance configuration is only used when the version is set to 2.

kofa · January 18, 2022, 12:33pm

I think they bumbed DT_CURRENT_PERFORMANCE_CONFIGURE_VERSION from 1 to 2; and if darktable detects that the one in darktablerc is old (1), it prompts the user:

github.com

darktable-org/darktable/blob/bbd387ccd3d77eb67cfbbf8506fc55f6f2b97eaa/src/common/darktable.c#L914

    
      
          gtk_init(&argc, &argv);
          
          
darktable.themes = NULL;
          
          
// execute a performance check and configuration if needed
          int last_configure_version = dt_conf_get_int("performance_configuration_version_completed");
          if(last_configure_version < DT_CURRENT_PERFORMANCE_CONFIGURE_VERSION)
          {
            // ask the user whether he/she would like
            // dt to make changes in the settings
            const gboolean run_configure = dt_gui_show_standalone_yes_no_dialog(
                _("darktable - run performance configuration?"),
                _("we have an updated performance configuration logic - executing that might improve the performance of "
                  "darktable.\nthis will potentially overwrite some of your existing settings - especially in case you "
                  "have manually modified them to custom values.\nwould you like to execute this update of the "
                  "performance configuration?\n"),
                _("no"), _("yes"));
          
          
  if(run_configure)
              dt_configure_performance();
            else