Slideshow and tiling on a Mac (diffuse and sharpen?): extremely slow

kofa · January 18, 2022, 3:53pm

I agree. It sounds silly not to use the memory if it’s available. If you have few cores, and darktable switches to tiling, those few cores will have to work even harder. (Threads may not be the same as cores, because of hyperthreading, but I have not looked into how the number of threads is determined.)

Update
I think they do it like this (checking both installed memory and threads) because they don’t just tweak memory usage parameters, but also decide on settings that affect the choice of algorithms to use: demosaicer for zoomed-out mode in the darkroom, and ui/performance (‘prefer performance over quality’), which seems to affect:

slideshow (https://github.com/darktable-org/darktable/blob/685119570688151a6686fb133e80f2c9e0907ca1/src/views/slideshow.c#L210)
generation of thumbnails (https://github.com/darktable-org/darktable/blob/13ef59fde32b97d92977462196a02b2e6584fd71/src/common/imageio.c#L689)
(the thumbnails of) the 2nd window (https://github.com/darktable-org/darktable/blob/d63cc859edeb2e4fc33f0bd86bc873a42229637f/src/views/darkroom.c#L4465 ; I guess this is for multi-monitor setup)

I think it’d make sense to separate the two:

tweak algorithm/quality settings based on CPU and maybe also on memory;
tweak purely memory-related settings based on available memory only.

@g-man : are you going to raise a new issue? Or should we resurrect mine (https://github.com/darktable-org/darktable/issues/10910)?

g-man · January 18, 2022, 6:36pm

I was going to raise an issue/pull request to modify the code, but I’m in no rush. I started to read the actual code in master to see if there are more changes since that last pull request. I think I noticed some other changes that use the >= 2, so it needs more investigation. I’m currently busy with work.

anon41087856 · January 18, 2022, 11:31pm

Background info:

diffuse or sharpen uses a wavelets decomposition with maximum 10 scales and needs to store the high frequency buffers for each scale plus the residual, because the diffusion process works coarse to fine.

Each scale has a blur size equal to 2^scale, so for the last scale, the radius is 1024 px.

For the tiled variant, a tile overlapping equal to the blur radius is necessary for numerical consistency with the direct approach. Meaning a padding of 1024 px is necessary on each side for the largest size. So the tile size is defined by a mandatory padding first, then the center region is filled as much as possible until RAM is saturated. Problem is the padding region gets computed 2 to 4 times instead of once.

When the image is downscaled, for example in the preview, the image highest frequencies are removed so the wavelets decomposition discards the n first scales and processes only the last. Also the blur radii are scaled by the zoom factor, so the coarsest scale radius is 1024 px * zoom, and so is the padding, which explains the performance boost.

Unfortunately, at export time, if you export at 1:1, no speed-up boost for you.

Diffusion is an iterative process and there is no other way. Actually, the wavelets scheme is already a speed-up, because it will get you in ~32 iterations similar results to what is achieved in the litterature with 100 to 150 iterations.

Also, diffusion is kind of a convolutional neural network and borderline AI. People have been asking for AI shit for years in dt, that’s the runtimes you get wich such methods.

Magic has a cost.

gpagnon · January 19, 2022, 8:32am

…and magic indeed it is, and high praise for the magician! The sharpen demosaicing preset, for one, is fantastic, best results I have ever seen (LR is not even close). I am not complaining at all — except with myself a bit at having bought a macbook, it seems darktable runs much better on linux. I am fine with the current exporting times of a minute or so per image on my machine: it was hours that was bugging me

g-man · January 19, 2022, 4:05pm

I documented the issue with one way to address it. There could be others solutions.

github.com/darktable-org/darktable

[performance] scale up performance to 2021 reality - Part 2

opened 04:03PM - 19 Jan 22 UTC

gi-man

Performance increases to DT to use more of the available memory were introduced …in this Pull request, https://github.com/darktable-org/darktable/pull/9764 The logic of the PR around the number of cores limits the system updating to use more of the available memory. The end results is that for most users, the system will default to a Host Memory of 1500 after performing the performance increase. **Describe the bug/issue** Some of the new modules in 3.8 (eg. Diffuse) can use a lot of memory to store all of the data. While users do have more available memory in their computers, the host_memory_limit remains at 1500 after the optimization from the PR #9764. The current logic in darktable.c https://github.com/darktable-org/darktable/blob/dc687a2497e73af9c90800b3d014e49b627a5915/src/common/darktable.c#L1594 https://github.com/darktable-org/darktable/blob/dc687a2497e73af9c90800b3d014e49b627a5915/src/common/darktable.c#L1607 ties the memory increase to the available CPU cores. But a system with 4 cores or less (very common), does not meet those two if statements, and it defaults to a host_memory_limit of 1500, even if they have more memory available. https://github.com/darktable-org/darktable/blob/dc687a2497e73af9c90800b3d014e49b627a5915/src/common/darktable.c#L1621-L1627 **Expected behavior** If the system has more than 2Gb, DT should allow the host_memory_limit to use it regardless of the number of CPU cores. If less, then use the 1500 default. A quick way to address this is to change: https://github.com/darktable-org/darktable/blob/dc687a2497e73af9c90800b3d014e49b627a5915/src/common/darktable.c#L1607 to else if(mem >= (8lu << 20) && threads >= 2 && atom_cores == 0) and https://github.com/darktable-org/darktable/blob/dc687a2497e73af9c90800b3d014e49b627a5915/src/common/darktable.h#L160 to 3 **Platform** * darktable version : 3.8 * OS : Windows 11, but it should apply to any * Memory : 16Gb * Graphics card : Nvidia 3060 12Gb * Graphics driver : 511.23 * OpenCL installed : yes * OpenCL activated : yes **Additional context** A very long discussion here: https://discuss.pixls.us/t/slideshow-and-tiling-on-a-mac-diffuse-and-sharpen-extremely-slow/28906/64