Hi all, I’m using the latest darktable 2.4.0 on Xubuntu 16.04, and it now takes me several minutes to export a simple jpg of a processed raw file. The process is using massive processor and memory resources. On earlier versions of darktable, it was much quick and used fewer resources. Admittedly, my laptop is underpowered, but something has clearly changed with the new version. Is there somewhere I can make an adjustment to improve things? See attached screenshots for processor/memory usage and for my export settings (which are the same as I’ve always used).
That is way too low amount of info. What’s the history stack? Is there opencl? What is the computer specification?
Start dt with -d perf -d dev -d opencl
, do the export, and show all the console output here.
Sorry, here is the xmp file: P1104697.orf.xmp (9.1 KB)
Specs of the computer are:
iullah-peppy
description: Computer
width: 64 bits
capabilities: vsyscall32
*-core
description: Motherboard
physical id: 0
*-memory
description: System memory
physical id: 0
size: 3888MiB
*-cpu
product: Intel(R) Celeron(R) 2955U @ 1.40GHz
vendor: Intel Corp.
physical id: 1
bus info: cpu@0
size: 1396MHz
capacity: 1400MHz
width: 64 bits
It tells me OpenCL is not available (see next post)
I am running another export now, console output to follow.
Here’s the salient parts of the console output:
[dev] took 1.264 secs (0.987 CPU) to load the image.
[export] creating pixelpipe took 0.229 secs (0.281 CPU)
[pixelpipe_process] [export] using device -1
pixelpipe cacheline 0 used 0 by 18446744073709551615
pixelpipe cacheline 1 used 0 by 18446744073709551615
cache hit rate so far: -nan
[dev_pixelpipe] took 0.000 secs (0.000 CPU) initing base buffer [export]
[dev_pixelpipe] took 0.046 secs (0.030 CPU) processed `raw black/white point' on CPU, blended on CPU [export]
[dev_pixelpipe] took 0.043 secs (0.035 CPU) processed `white balance' on CPU, blended on CPU [export]
[dev_pixelpipe] took 0.010 secs (0.020 CPU) processed `highlight reconstruction' on CPU, blended on CPU [export]
[default_process_tiling_roi] use tiling on module 'demosaic' for image with full input size 5240 x 3912
[default_process_tiling_roi] (4 x 4) tiles with max dimensions 1753 x 1309
....
[dev_pixelpipe] took 3.599 secs (5.655 CPU) processed `demosaic' on CPU with tiling, blended on CPU [export]
[default_process_tiling_ptp] no need to use tiling for module 'basecurve' as no real memory saving to be expected
[default_process_tiling_ptp] fall back to standard processing for module 'basecurve'
[dev_pixelpipe] took 0.240 secs (0.272 CPU) processed `base curve' on CPU with tiling, blended on CPU [export]
[default_process_tiling_ptp] no need to use tiling for module 'colorin' as no real memory saving to be expected
[default_process_tiling_ptp] fall back to standard processing for module 'colorin'
[dev_pixelpipe] took 0.199 secs (0.337 CPU) processed `input color profile' on CPU with tiling, blended on CPU [export]
[default_process_tiling_ptp] use tiling on module 'atrous' for image with full size 5000 x 3733
[default_process_tiling_ptp] (24 x 18) tiles with max dimensions 723 x 723 and overlap 256
.....
[dev_pixelpipe] took 0.697 secs (1.087 CPU) processed `sharpen' on CPU with tiling, blended on CPU [export]
[default_process_tiling_ptp] no need to use tiling for module 'colorout' as no real memory saving to be expected
[default_process_tiling_ptp] fall back to standard processing for module 'colorout'
[dev_pixelpipe] took 0.373 secs (0.691 CPU) processed `output color profile' on CPU with tiling, blended on CPU [export]
[dev_pixelpipe] took 0.120 secs (0.217 CPU) processed `gamma' on CPU, blended on CPU [export]
[dev_process_export] pixel pipeline processing took 461.493 secs (653.987 CPU)
[export_job] exported to `/media/iullah/Seagate Backup Plus Drive/Photography/2018/20180110/darktable_exported/P1104697_01.jpg'
Here’s what initially popped when I opened darktable with the suggested options:
iullah@iullah-Peppy:~$ darktable -d perf -d dev -d opencl
[opencl_init] opencl related configuration options:
[opencl_init]
[opencl_init] opencl: 1
[opencl_init] opencl_library: ''
[opencl_init] opencl_memory_requirement: 768
[opencl_init] opencl_memory_headroom: 300
[opencl_init] opencl_device_priority: '*/!0,*/*/*'
[opencl_init] opencl_mandatory_timeout: 200
[opencl_init] opencl_size_roundup: 16
[opencl_init] opencl_async_pixelpipe: 0
[opencl_init] opencl_synch_cache: 0
[opencl_init] opencl_number_event_handles: 25
[opencl_init] opencl_micro_nap: 1000
[opencl_init] opencl_use_pinned_memory: 0
[opencl_init] opencl_use_cpu_devices: 0
[opencl_init] opencl_avoid_atomics: 0
[opencl_init]
[opencl_init] could not find opencl runtime library 'libOpenCL'
[opencl_init] could not find opencl runtime library 'libOpenCL.so'
[opencl_init] found opencl runtime library 'libOpenCL.so.1'
[opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded
[opencl_init] could not get platforms: -1001
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.
size: 3888MiB
Yeah, it is pretty much expected that it is super sluggish.
You need 4GB RAM minimum, highly recommended to have 8+ GB.
product: Intel(R) Celeron(R) 2955U @ 1.40GHz
Is that 2 cores? Also not surprising at all that it is super slow.
…
A screenshot, srsly?
But you are likely using some memory-heavy module, and since there is no RAM, it processes the image in tiles, which is super slow.
EDIT: so it is demosaic. A surprize, i thought it would be basecurve (exposure fusion)/local contrast/equalizer/profiled denoise.
So which demosaicing mode are you using?
You likely want to kill all the other apps when running dt, and play with “host memory limit (in MB) for tiling” and “minimum amount of memory (in MB) for a single buffer in tiling” and “do high quality resampling during export” params, see https://www.darktable.org/usermanual/en/core_options.html
Will a properly functioning OpenCL implementation make a big difference with just on-chip GPU (Intel)? I installed the OpenCL dev packages, and darktable can now find these libraries, but it still says OpenCL is not available.
iullah@iullah-Peppy:~$ darktable -d perf -d dev -d opencl
[opencl_init] opencl related configuration options:
[opencl_init]
[opencl_init] opencl: 1
[opencl_init] opencl_library: ''
[opencl_init] opencl_memory_requirement: 768
[opencl_init] opencl_memory_headroom: 300
[opencl_init] opencl_device_priority: '*/!0,*/*/*'
[opencl_init] opencl_mandatory_timeout: 200
[opencl_init] opencl_size_roundup: 16
[opencl_init] opencl_async_pixelpipe: 0
[opencl_init] opencl_synch_cache: 0
[opencl_init] opencl_number_event_handles: 25
[opencl_init] opencl_micro_nap: 1000
[opencl_init] opencl_use_pinned_memory: 0
[opencl_init] opencl_use_cpu_devices: 0
[opencl_init] opencl_avoid_atomics: 0
[opencl_init]
[opencl_init] found opencl runtime library 'libOpenCL'
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] could not get platforms: -1001
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.
I’m sorry, but unlike what people might think, we devs don’t posses the mind-reading skill.
You did not specify which videocard you have, so it is impossible to answer.
But even then, likely no; strapping the rocket engine to the bicycle will not magically let it be faster.
Hi, sorry again, I thought it was in my paste from lshw
above, but it wasn’t. There is only the on-board integrated intel graphics:
*-display
description: VGA compatible controller
product: Haswell-ULT Integrated Graphics Controller
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 09
width: 64 bits
clock: 33MHz
capabilities: vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:43 memory:e0000000-e03fffff memory:d0000000-dfffffff ioport:1800(size=64) memory:c0000-dffff
Is the ultimate answer simply that darktable 2.4.0 will be slow on this computer? What changed from previous versions to make this so? I can’t remember exactly which previous version I was using, but exports took maybe a minute or two for the same kinds of raw files on this same computer…
Then nope, i believe it will not be any faster.
It will depend on the exact history stack and images. Do follow the last portion of my first comment.
It isn’t helping that your computer is named peppy?
Ok, thank you! That gives me some things to go on for now. I will try these out in the next couple of days (have some other things to work on for the moment), and will report back about any improvements I am able to make.
@paperdigits Lol! “Peppy” is the code name for this venerable Acer C-720 Chromebook, which was really the first model we were able to port a fully functioning native installation of linux on (and not simply through the “crouton” chroot script). It’s still going after 5 years!! I guess it is time for an upgrade, though. Thinking very heavily of purchasing the Alpha Centurion Nano…
Oh, sorry, I missed your post while I was writing other posts, and I did not see this! Yes, I am trying to use the AMaZE demosaicking routine. I guess I should stick to a simpler one on this machine, and will kill all other procs.
EDIT: it looks like the longest time and largest number of tiles was for “atrous,” which I don’t know anything about. It’s not something I believe I enabled…
That is equalizer. It’s one of the heaviest/hungriest modules, so not surprising at all.
Aha!! That is really helpful! I have been using the equalizer recently (after reading some threads about it here on pixls), and I never used to use it before. I did not realize it was so intensive. I will go back to using “local contrast” and simpler noise removal and sharpening. Thanks!
Note that mode = local laplacian filter
there is is comparably heavy.
More than 7 mimutes for a less than 20 MPixel file? That’s really a lot of processing time…
Or did I miss something?
Thanks! This is all very helpful… For old hardware, I guess it’s very important to know which tools are heavy, and which are light.
@heckflosse Yes, more than 7 minutes for 20mpix. But I guess my hardware is barely adequate for this.
I, too, have been pondering export times in darktable. Getting opencl working on my hardware made a significant difference, as did selecting “multiple GPUs” in the CL options.
I, too, have recently started using Equalizer, after this discussion on goto modules.
What has puzzled me the most is watching a file manager window of the folder into which I am exporting, and seeing the same jpeg produced multiple times: a slow crawl up to 30MB or so, then starting again. I’m not sure whether this is due to:
-
Using the equalizer module
-
Setting “multiple GPUs” in my opencl options
-
Or some other reason, like the jpeg export quality setting (which I have at 96%)
Is any darktable expert able to shed some light on multiple exports of the same jpeg?
Chromebook isn’t exactly a power house to begin with either.
@darix Yes, I agree. But the initial reason I started this thread is that previously, export times were reasonable on exactly the same hardware. What I didn’t realize was the quite severe computational requirements of some of the tools that I have recently added to my processing pipeline due to the very excellent results I’ve seen them capable of (like in the thread linked by @martin.scharnke above). I do imagine, being in the FOSS world as we are, that there are probably a lot of people out there across the world on older hardware who would like to use a tool like darktable because of, among other things, their financial situation. While I am in the position to be able to upgrade (and indeed I likely will soon), many folks are not. So, it’s really good to know how to use darktable efficiently on underpowered hardware, and also good to know which tools one should avoid in those situations. This thread has been very informative for me in this regard.