Linux Photo Station thoughts and doubts

I moved my photographic workflow from proprietary software (Capture One) to digikam for DAM and darktable for editing and post processing.

I currently use both software on a Macbook Air M1 8Gb. Not the fastest I admit, but both software are very very usable. I did not expect this for a basic machine. Some tasks obviously need more power and I decided to invest some money for a dedicated workstation.

I’m also using digikam and darktable on a Linux PC with Debian (AMD 3900X + RTX2070s) and I was curious to see the performance and understand if I could upgrade some components like the GPU that has ‘only 6Gb’

I stumbled upon this page by chance https://math.dartmouth.edu/~sarunas/darktable_bench.html
and out of curiosity I ran the same tests on both MacBook and PC.
I did the tests with a similar configuration and expected similar results, but:

Sarunas configuration
darktable 4.4.0
CPU only Ryzen 5 5600X @4.6GHz 6C/12T, 32GiB RAM @3.2GT/s, NVMe SSD, Linux 6.2, Ubuntu 23.04 (storas) > 12.5 sec
+GeForce RTX 2070 8GB GPU (TU106) > 2.3 secs.

My configuration
4.5.0~git194.0bbb6717-1+10382.1
CPU only Ryzen 9 AMD Ryzen 9 3900X 12-Core Processor, 32Gb > 8.9sec
+GPU GeForce RTX 2070 SUPER 8GB > 6.5sec

With CPU only, my configuration is better, but with the same GPU, mine is about 3 times slower.
In the console, there are two operations that have an error and I don’t know if related to my version of darktable or another problem.

[opencl_summary_statistics] device ‘NVIDIA CUDA NVIDIA GeForce RTX 2070 SUPER’ (0): 157 out of 159 events were successful and 2 events lost. max event=156, clmem runtime problem

Again out of curiosity, I ran the same tests on the MacBook Air M1 8Gb and apart from very slow times, everything worked without any problem.

On the Sarunas page I linked above there are comparisons of different configurations including even a MacStudio with one exaggerated configuration (M2 Ultra 24CPU/76GPU/192GiB) and another more moderate one (M2 Pro 10CPU/16GPU/32GiB).

The results of pure power performance are very similar if not better and the power consumption is also very noticeable always in favor of the Macs, and assuming constant use at the end of the year makes a difference on the electric bill.
I can already imagine the first objection concerning the cost of the mac, and I am aware of the price difference.

I am struggling to form an opinion and have some doubts and would like to ear your opinion.

Assuming that the current PC is used exclusively for gaming with my children, after some research and some suggestions I tried to figure out what a build for a dedicated darktable workstation might be:

CPU: Intel Core i5-13600KF 3.5 GHz 14-Core Processor
CPU Cooler: Thermalright Peerless Assassin 120 SE 66.17 CFM CPU Cooler
Motherboard: MSI B760 GAMING PLUS WIFI ATX LGA1700 Motherboard
Memory: Corsair Vengeance 32 GB (2 x 16 GB) DDR5-5600 CL36 Memory
Storage: Crucial P5 Plus 2 TB M.2-2280 PCIe 4.0 X4 NVME Solid State Drive
Video Card: Sapphire PULSE Radeon RX 6700 XT 12 GB Video Card
Case: be quiet! Pure Base 500DX ATX Mid Tower Case
Power Supply: Gigabyte UD850GM 850 W 80+ Gold Certified Fully Modular ATX Power Supply
Wired Network Adapter: Asus XG-C100C 10 Gb/s Ethernet PCIe x4 Network Adapter

Total: ~1450EUR

Pros:

  • In my budget
  • 2TB of local storage
  • 10GBe network card
  • performance “should” be high (if no driver issue???)
  • being a custom PC possibility to upgrade components in the future (e.g. storage, GPU, RAM etc)

Cons:

  • usability/configuration frustation: I assumed an AMD GPU with 12gb, but on Linux I only have experience with NVIDIA and from my tests there have been some problems, this scares me because if because of the drivers the performance is compromised, I would not want to spend my time fixing driver problems instead of dealing with my photography workflow;
  • high power consumption;
  • the case is big;
  • probably loud, not silent as a Mac.

For the Apple configuration I assumed 2 configurations of a Mac mini, one minimal in line with my budget and one similar to Sarunas’ one (for comparison).

The first is better than my current M1, but I added the 10GBe card and maximized the RAM:

Apple M2 with 8-core CPU, 10-core GPU, 16‑core Neural Engine
24GB unified memory
512GB SSD storage
10 Gigabit Ethernet

Total: 1504EUR

Pros:

  • compact, fits under the desk
  • performance between my current M1 and sarunas configuration
  • no driver issues(???)
  • in line with my budget
  • very low power consumption
  • quiet
  • no usability/configuration frustation

Cons:

  • storage limited to 512Gb
  • need to add an external drive (I already have one, but still a PITA)
  • not expandable
  • if it breaks, I can’t fix it

Sarunas configuration, to which I added the network card 10gbe:

Apple M2 Pro with 10‑core CPU, 16-core GPU, 16‑core Neural Engine
32GB unified memory
512GB SSD storage
10 Gigabit Ethernet

Total: 2124EUR

Pros:

  • compact, fits under the desk
  • performance in line with expectations
  • no driver issues(???)
  • in line with my budget
  • very low power consumption
  • quiet
  • no usability/configuration frustation

Cons:

  • not in line with my budget
  • storage limited to 512Gb
  • need to add an external drive (I already have one, but still a PITA)
  • not expandable
  • if it breaks, I can’t fix it

I would like to have your opinion even though in a forum where most of the users are *nix, I already guess what are you thinking :slight_smile:
Especially I’d like to be reassured more about the GPU and driver related part and the error due to clmem runtime problem.

Finally, if anyone uses a mac M2 mini I am curious to hear your experience, and see if the performance is in line with that of Sarunas’ test.

Apologise for the wall of text and hope my doubts will help other users that are switching from proprietary software workflow to a FOSS one to form a better opinion.

Before doing anything else, I’d check which events were lost. If darktable has to restart an operation from scratch on the GPU, that costs a lot of time. And, I’d use (if possible) a stable version for that kind of tests,
Also, the driver version for your current linux box might be interesting.

As it is now, we have several variables/unknowns where the two test configurations differ, which makes comparison difficult. The more as it looks as if there’s a driver or configuration issue with your current linux setup.

And while high performance for export (like you do with darktable-cli) is nice, is it all that important for you?
Personally, I’m more interested in the performance while editing, as too many delays there can make working with the program frustrating.

So basically, what I’m saying is: don’t base a decision on a flawed test, and know what’s important for you.

2 Likes

In addition to what @rvietor mentions there is a fair bit of room to tweak the settings used for OPENCL…sometimes there is not much of a change but in other cases it can be quite significant so I would also explore some of those options to be sure you have things set optimally for your hardware…

This card that you have is plenty for darktable. The issue has to be a setup/driver on your current one. Even 6GB of VRAM is more than enough. If you want to troubleshoot this system, post the darktable-cltest information.

1 Like

Below is the complete output of test run
darktable-cli setubal.orf setubal.orf.xmp test.jpg --core -d opencl -d perf
on storas system (still the same hardware and software config). You can compare it with the output in your case and hopefully see where the difference is. Please note that ‘tune OpenCL performance’ is ‘nothing’ (MEMORY TUNING: NO) in my case…

     0.0211 [dt_get_sysresource_level] switched to 2 as `large'
     0.0211   total mem:       31991MB
     0.0211   mipmap cache:    3998MB
     0.0211   available mem:   21869MB
     0.0211   singlebuff:      499MB
     0.0211   OpenCL tune mem: OFF
     0.0211   OpenCL pinned:   OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'very fast GPU'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
     0.0215 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL'
     0.0216 [dt_dlopencl_init] could not find default opencl runtime library 'libOpenCL.so'
[opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'NVIDIA GeForce RTX 2070'
   PLATFORM NAME & VENDOR:   NVIDIA CUDA, NVIDIA Corporation
   CANONICAL NAME:           nvidiacudanvidiageforcertx2070
   DRIVER VERSION:           535.54.03
   DEVICE VERSION:           OpenCL 3.0 CUDA, SM_20 SUPPORT
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          7941 MB
   MAX MEM ALLOC:            1985 MB
   MAX IMAGE SIZE:           32768 x 32768
   MAX WORK GROUP SIZE:      1024
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 64 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   MEMORY TUNING:            NO
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      128
   PERFORMANCE:              1.101
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /usr/share/darktable/kernels
   KERNEL DIRECTORY:         /home/ereliai/.cache/darktable/cached_v1_kernels_for_NVIDIACUDANVIDIAGeForceRTX2070_5355403
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   KERNEL LOADING TIME:       0.0123 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init]           0       'NVIDIA CUDA NVIDIA GeForce RTX 2070'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           1       1       1       1       1
[opencl_synchronization_timeout] synchronization timeout set to 0
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           0       0       0       0       0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities]           image   preview export  thumbs  preview2
[dt_opencl_update_priorities]           1       1       1       1       1
[opencl_synchronization_timeout] synchronization timeout set to 0
     0.8418 [dt_dev_load_raw] loading the image. took 0.442 secs (0.474 CPU)
     0.8858 [export] creating pixelpipe took 0.041 secs (0.250 CPU)
     0.8858 [dt_opencl_check_tuning] use 6627MB (tunemem=OFF, pinning=OFF) on device `NVIDIA CUDA NVIDIA GeForce RTX 2070' id=0
     0.8861 [dev_pixelpipe] took 0.000 secs (0.000 CPU) initing base buffer [export]
     0.8965 [dev_pixelpipe] took 0.010 secs (0.058 CPU) [export] processed `rawprepare' on GPU, blended on GPU
     0.8984 [dev_pixelpipe] took 0.002 secs (0.002 CPU) [export] processed `temperature' on GPU, blended on GPU
     0.9073 [dev_pixelpipe] took 0.009 secs (0.009 CPU) [export] processed `highlights' on GPU, blended on GPU
     1.0001 [dev_pixelpipe] took 0.093 secs (0.177 CPU) [export] processed `hotpixels' on CPU, blended on CPU
     1.0501 [dev_pixelpipe] took 0.050 secs (0.113 CPU) [export] processed `demosaic' on GPU, blended on GPU
     1.7406 [dev_pixelpipe] took 0.690 secs (0.439 CPU) [export] processed `denoiseprofile' on GPU with tiling, blended on CPU
     2.1484 [dev_pixelpipe] took 0.408 secs (1.506 CPU) [export] processed `lens' on GPU, blended on GPU
     2.1560 [dev_pixelpipe] took 0.008 secs (0.007 CPU) [export] processed `ashift' on GPU, blended on GPU
     2.1620 [dev_pixelpipe] took 0.006 secs (0.002 CPU) [export] processed `exposure' on GPU, blended on GPU
     2.1716 [dev_pixelpipe] took 0.010 secs (0.009 CPU) [export] processed `colorin' on GPU, blended on GPU
     2.1775 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.006 secs (0.005 GPU) [channelmixerrgb]
     2.1902 [dev_pixelpipe] took 0.019 secs (0.015 CPU) [export] processed `channelmixerrgb' on GPU, blended on GPU
     2.3090 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0.051 secs (0.575 CPU) [atrous]
     2.7930 [dev_pixelpipe] took 0.603 secs (1.189 CPU) [export] processed `atrous' on GPU with tiling, blended on CPU
     2.8991 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0.006 secs (0.002 GPU) [colorbalancergb]
     2.9122 [dev_pixelpipe] took 0.119 secs (0.115 CPU) [export] processed `colorbalancergb' on GPU, blended on GPU
     2.9236 [dev_pixelpipe] took 0.011 secs (0.011 CPU) [export] processed `rgblevels' on GPU, blended on GPU
     2.9309 [dev_pixelpipe] took 0.007 secs (0.007 CPU) [export] processed `sigmoid' on GPU, blended on GPU
     2.9370 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0.006 secs (0.006 GPU) [bilat]
     3.0506 [dev_pixelpipe] took 0.120 secs (0.109 CPU) [export] processed `bilat' on GPU, blended on GPU
     3.0857 [dev_pixelpipe] took 0.035 secs (0.011 CPU) [export] processed `colorout' on GPU, blended on GPU
     3.0865 [resample_cl] took 0.001 secs (0.000 CPU) 1:1 copy/crop of 8065x6046 pixels
     3.0916 [dev_pixelpipe] took 0.006 secs (0.005 CPU) [export] processed `finalscale' on GPU, blended on GPU
     3.1589 [opencl_profiling] profiling device 0 ('NVIDIA CUDA NVIDIA GeForce RTX 2070'):
     3.1589 [opencl_profiling] spent  0.4483 seconds in [Write Image (from host to device)]
     3.1589 [opencl_profiling] spent  0.0012 seconds in rawprepare_1f
     3.1589 [opencl_profiling] spent  0.0012 seconds in whitebalance_1f
     3.1589 [opencl_profiling] spent  0.0008 seconds in highlights_initmask
     3.1589 [opencl_profiling] spent  0.0013 seconds in highlights_dilatemask
     3.1589 [opencl_profiling] spent  0.1567 seconds in [Write Buffer (from host to device)]
     3.1589 [opencl_profiling] spent  0.0036 seconds in highlights_chroma
     3.1589 [opencl_profiling] spent  0.0000 seconds in [Read Buffer (from device to host)]
     3.1589 [opencl_profiling] spent  0.0014 seconds in highlights_opposed
     3.1589 [opencl_profiling] spent  0.6009 seconds in [Read Image (from device to host)]
     3.1589 [opencl_profiling] spent  0.0005 seconds in border_interpolate
     3.1589 [opencl_profiling] spent  0.0023 seconds in rcd_border_green
     3.1589 [opencl_profiling] spent  0.0026 seconds in rcd_border_redblue
     3.1589 [opencl_profiling] spent  0.0039 seconds in rcd_populate
     3.1589 [opencl_profiling] spent  0.0020 seconds in rcd_step_1_1
     3.1589 [opencl_profiling] spent  0.0017 seconds in rcd_step_1_2
     3.1589 [opencl_profiling] spent  0.0008 seconds in rcd_step_2_1
     3.1589 [opencl_profiling] spent  0.0027 seconds in rcd_step_3_1
     3.1589 [opencl_profiling] spent  0.0023 seconds in rcd_step_4_1
     3.1589 [opencl_profiling] spent  0.0008 seconds in rcd_step_4_2
     3.1589 [opencl_profiling] spent  0.0026 seconds in rcd_step_5_1
     3.1589 [opencl_profiling] spent  0.0043 seconds in rcd_step_5_2
     3.1589 [opencl_profiling] spent  0.0041 seconds in rcd_write_output
     3.1589 [opencl_profiling] spent  0.0045 seconds in denoiseprofile_precondition_Y0U0V0
     3.1589 [opencl_profiling] spent  0.1490 seconds in denoiseprofile_decompose
     3.1589 [opencl_profiling] spent  0.0175 seconds in denoiseprofile_reduce_first
     3.1589 [opencl_profiling] spent  0.0001 seconds in denoiseprofile_reduce_second
     3.1589 [opencl_profiling] spent  0.0469 seconds in denoiseprofile_synthesize
     3.1589 [opencl_profiling] spent  0.0276 seconds in [Copy Image (on device)]
     3.1589 [opencl_profiling] spent  0.0045 seconds in denoiseprofile_backtransform_Y0U0V0
     3.1589 [opencl_profiling] spent  0.0066 seconds in lens_vignette
     3.1589 [opencl_profiling] spent  0.0158 seconds in lens_distort_bicubic
     3.1589 [opencl_profiling] spent  0.0065 seconds in ashift_bicubic
     3.1589 [opencl_profiling] spent  0.0051 seconds in exposure
     3.1589 [opencl_profiling] spent  0.0052 seconds in colorin_unbound
     3.1589 [opencl_profiling] spent  0.0095 seconds in colorspaces_transform_lab_to_rgb_matrix
     3.1589 [opencl_profiling] spent  0.0053 seconds in channelmixerrgb_CAT16
     3.1589 [opencl_profiling] spent  0.1945 seconds in eaw_decompose
     3.1589 [opencl_profiling] spent  0.0581 seconds in eaw_synthesize
     3.1589 [opencl_profiling] spent  0.0069 seconds in colorbalancergb
     3.1589 [opencl_profiling] spent  0.0053 seconds in rgblevels
     3.1589 [opencl_profiling] spent  0.0064 seconds in sigmoid_loglogistic_per_channel_interpolated
     3.1589 [opencl_profiling] spent  0.0053 seconds in colorspaces_transform_rgb_matrix_to_lab
     3.1589 [opencl_profiling] spent  0.0048 seconds in pad_input
     3.1589 [opencl_profiling] spent  0.0341 seconds in gauss_reduce
     3.1589 [opencl_profiling] spent  0.0215 seconds in process_curve
     3.1589 [opencl_profiling] spent  0.0376 seconds in laplacian_assemble
     3.1589 [opencl_profiling] spent  0.0047 seconds in write_back
     3.1589 [opencl_profiling] spent  0.0074 seconds in colorout
     3.1589 [opencl_profiling] spent  1.9369 seconds totally in command queue (with 0 events missing)
     3.1589 [dev_process_export] pixel pipeline processing took 2.273 secs (3.863 CPU)
     3.7907 [export_job] exported to `test.jpg'
 [opencl_summary_statistics] device 'NVIDIA CUDA NVIDIA GeForce RTX 2070' (0): 262 out of 262 events were successful and 0 events lost. max event=261


In these cases darktable benchmark is the only time darktable will be ever run on them (they are applied mathematics workstations). Hopefully one day they will be running Asahi Linux, as currently M1 systems are.

I thank you for your answer, and in general I agree. There are many things I could not write and the post was very long :slight_smile:

First of all, yes. I agree that export performance is not everything, but it was a way for me to compare and understand.

I started my tests on debian 12 stable, nvidia driver stable. And I had those clmem problems. Then I decided to switch to unstable to see if upgrading changed things. Same exact thing.

I retested this morning with a nightly version (forgive me but I forgot to save the version in the logs) which I attach as darktable-perf-old-nightly. Here are the 2 clmem errors.

I just downloaded from OBS the latest nightly darktable 4.5.0~git613.fad5e604-1+10584.1 and surprise surprise there is only one error.

I also attach the opencl information, the Nvidia drivers are 525.125.06.

In general, on this little macbook air the editing experience is very good and the more complex tasks are visibly slow and I can’t expect much. (As a comparison, Capture One is not even usable on a macbook air).

On the PC with the GPU the editing is super fast, even modules like diffuse and sharpen are immediate.
So I was wondering if a mac mini would be sufficient in the end. Not so much for export performance but for more general responsiveness.

As I said before performance I considered it to understand and have a comparison and I agree that the tests I did were not done under replicable conditions, but they were useful to understand that there is a problem with clmem.

darktable-perf-latest.txt (11.2 KB)
darktable-perf-old-nightly.txt (10.5 KB)
opencl-info.txt (17.2 KB)

Here the darktable-cltest log.

I see now that there is a warning/error about libOpenCL but Open CL is available and enabled:

[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.

darktable-cltest.txt (3.6 KB)

After reading you log I realised that last time I gave up upgrading the nvidia driver from 525 to 535. This time I made it and, surprise surpise:

13.1184 [dev_process_export] pixel pipeline processing took 2.058 secs (4.877 CPU)

No clmem errors anymore :slight_smile:

1 Like