I am planning on investing in new hardware. From what I’ve come to understand, having more cpu-cores can be more efficient than high single core speed and vice-versa depending on the postprocessing software one uses.
Do you know, which of the two is more efficient in speeding up darktable?
Thank you in advance!
For 24-26Mpx images I found a second hand 8GB GTX1080 to be plenty. I could afford to buy a much more powerful one but I suspect that the cost would outweigh the gains. Mostly I care about not waiting for ages while doing interactive work and even with two instances of diffuse|sharpen this card is plenty for me (driving a 1440p monitor).
Core counts on modern CPUs are highly confusing. Not only are there more threads than cores thanks to hyperthreading, but the latest generations also distinguish between P(ower) and E(fficiency) cores. Only the number of P cores counts for peak performance.
My 24 “core” workstation has 8 P cores. Performance scales linearly with the number of processes, up to 8 processes. But running 24 processes vs 8 processes is only 10% faster. In other words, only P cores count.
(this was not a darktable workload, but regular FP processing, so YMMV)
This is rather strange, efficiency cores are weaker but not that weak. I’d say it would be a good test to run some benchmarks in darktable with them turned on vs off, might provide some interesting data.
As for their performance you can google some benchmarks of intel n100 and n200, these are cpus made only out of their gracemont efficiency cores(4) and it provides a good idea of what they are capable of. Of course in this configuration they have dedicated memory, the scheduler doesn’t have to worry about big little etc…
I know nothing about processors and microcode, but my assumption is that cores work in tandem. Modern cores likely share a lot more high bandwidth resources than previous packages, so even efficiency and specialist cores have their role in boosting the overall performance. There are also other factors that improve performance such as IPC, RAM, etc. It is less clearcut than before and it depends on which benchmarks one uses to quantify performance. I wonder if we have darktable benchmark tests for people to play with.
Platform: Radxa Rock Pi 5B, 16 GB RAM, passive cooling
CPU: Rockchip rk3588, ARM64, little (weak) cores nrs. 0-3, BIG (strong) cores nrs. 4-7
OpenCL: disabled for the test
Image tested: used from here: GPU benchmarks in darktable
TEST 1: all cores in action (56,6091s total time)
tux@rock5b ~/F/m/~/bench_raw> taskset -c 0-7 darktable-cli setubal.orf setubal.orf.xmp test.jpg --core --disable-opencl -d perf
output file already exists, it will get renamed
this is darktable 4.5.0+733~g5d1159758
copyright (c) 2009-2023 johannes hanika
https://github.com/darktable-org/darktable/issues/new/choose
compile options:
bit depth is 64 bit
normal build
SSE2 optimizations unavailable
OpenMP support enabled
OpenCL support enabled
Lua support enabled, API version 9.2.0
Colord support enabled
gPhoto2 support enabled
G'MIC support disabled (compressed LUTs will not be supported)
GraphicsMagick support enabled
ImageMagick support disabled
libavif support disabled
libheif support disabled
libjxl support disabled
OpenJPEG support enabled
OpenEXR support enabled
WebP support enabled
(darktable-cli:6788): Gtk-WARNING **: 18:23:45.545: gtk_disable_setlocale() must be called before gtk_init()
1,6484 [dt_dev_load_raw] loading the image. took 1,011 secs (1,064 CPU)
1,7540 [export] creating pixelpipe took 0,098 secs (0,678 CPU)
1,7541 [dev_pixelpipe] took 0,000 secs (0,000 CPU) initing base buffer [export]
1,9457 [dev_pixelpipe] took 0,192 secs (1,305 CPU) [export] processed `rawprepare' on CPU, blended on CPU
2,0010 [dev_pixelpipe] took 0,055 secs (0,184 CPU) [export] processed `temperature' on CPU, blended on CPU
2,2427 [dev_pixelpipe] took 0,242 secs (1,791 CPU) [export] processed `highlights' on CPU, blended on CPU
2,3759 [dev_pixelpipe] took 0,133 secs (0,979 CPU) [export] processed `hotpixels' on CPU, blended on CPU
3,1240 [dev_pixelpipe] took 0,748 secs (4,625 CPU) [export] processed `demosaic' on CPU, blended on CPU
19,4733 [dev_pixelpipe] took 16,349 secs (106,241 CPU) [export] processed `denoiseprofile' on CPU, blended on CPU
19,5522 [dev_pixelpipe] took 0,079 secs (0,315 CPU) [export] processed `lens' on CPU, blended on CPU
20,4509 [dev_pixelpipe] took 0,899 secs (5,914 CPU) [export] processed `ashift' on CPU, blended on CPU
20,5308 [dev_pixelpipe] took 0,080 secs (0,541 CPU) [export] processed `exposure' on CPU, blended on CPU
20,7166 [dev_pixelpipe] took 0,186 secs (1,385 CPU) [export] processed `colorin' on CPU, blended on CPU
20,9912 [dt_ioppr_transform_image_colorspace] IOP_CS_LAB-->IOP_CS_RGB took 0,274 secs (2,024 CPU) [channelmixerrgb]
22,5986 [dev_pixelpipe] took 1,882 secs (12,623 CPU) [export] processed `channelmixerrgb' on CPU, blended on CPU
22,8132 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0,214 secs (1,573 CPU) [atrous]
40,0404 [dev_pixelpipe] took 17,442 secs (111,451 CPU) [export] processed `atrous' on CPU, blended on CPU
40,3308 [dt_ioppr_transform_image_colorspace] IOP_CS_LAB-->IOP_CS_RGB took 0,290 secs (1,934 CPU) [colorbalancergb]
46,2549 [dev_pixelpipe] took 6,214 secs (40,064 CPU) [export] processed `colorbalancergb' on CPU, blended on CPU
46,5448 [dev_pixelpipe] took 0,290 secs (2,075 CPU) [export] processed `rgblevels' on CPU, blended on CPU
48,5767 [dev_pixelpipe] took 2,032 secs (13,347 CPU) [export] processed `sigmoid' on CPU, blended on CPU
48,7878 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0,211 secs (1,535 CPU) [bilat]
54,2170 [dev_pixelpipe] took 5,640 secs (33,204 CPU) [export] processed `bilat' on CPU, blended on CPU
54,6544 [dev_pixelpipe] took 0,437 secs (3,176 CPU) [export] processed `colorout' on CPU, blended on CPU
54,7357 [resample_plain] took 0,081 secs (0,553 CPU) 1:1 copy/crop of 8065x6046 pixels
54,7357 [dev_pixelpipe] took 0,081 secs (0,554 CPU) [export] processed `finalscale' on CPU, blended on CPU
54,7358 [dev_process_export] pixel pipeline processing took 52,982 secs (339,788 CPU)
56,6091 [export_job] exported to `test_02.jpg'
TEST 2: only BIG (strong) cores in action (57,4236s total time)
tux@rock5b ~/F/m/~/bench_raw> taskset -c 4-7 darktable-cli setubal.orf setubal.orf.xmp test.jpg --core --disable-opencl -d perf
output file already exists, it will get renamed
this is darktable 4.5.0+733~g5d1159758
copyright (c) 2009-2023 johannes hanika
https://github.com/darktable-org/darktable/issues/new/choose
compile options:
bit depth is 64 bit
normal build
SSE2 optimizations unavailable
OpenMP support enabled
OpenCL support enabled
Lua support enabled, API version 9.2.0
Colord support enabled
gPhoto2 support enabled
G'MIC support disabled (compressed LUTs will not be supported)
GraphicsMagick support enabled
ImageMagick support disabled
libavif support disabled
libheif support disabled
libjxl support disabled
OpenJPEG support enabled
OpenEXR support enabled
WebP support enabled
(darktable-cli:6966): Gtk-WARNING **: 18:25:23.745: gtk_disable_setlocale() must be called before gtk_init()
1,6480 [dt_dev_load_raw] loading the image. took 1,012 secs (0,970 CPU)
1,7447 [export] creating pixelpipe took 0,090 secs (0,328 CPU)
1,7448 [dev_pixelpipe] took 0,000 secs (0,000 CPU) initing base buffer [export]
1,8420 [dev_pixelpipe] took 0,097 secs (0,282 CPU) [export] processed `rawprepare' on CPU, blended on CPU
1,8823 [dev_pixelpipe] took 0,040 secs (0,025 CPU) [export] processed `temperature' on CPU, blended on CPU
1,9933 [dev_pixelpipe] took 0,111 secs (0,407 CPU) [export] processed `highlights' on CPU, blended on CPU
2,0624 [dev_pixelpipe] took 0,069 secs (0,273 CPU) [export] processed `hotpixels' on CPU, blended on CPU
2,6710 [dev_pixelpipe] took 0,609 secs (2,181 CPU) [export] processed `demosaic' on CPU, blended on CPU
20,6418 [dev_pixelpipe] took 17,971 secs (70,011 CPU) [export] processed `denoiseprofile' on CPU, blended on CPU
20,7213 [dev_pixelpipe] took 0,080 secs (0,318 CPU) [export] processed `lens' on CPU, blended on CPU
21,5479 [dev_pixelpipe] took 0,827 secs (3,299 CPU) [export] processed `ashift' on CPU, blended on CPU
21,6256 [dev_pixelpipe] took 0,078 secs (0,306 CPU) [export] processed `exposure' on CPU, blended on CPU
21,7455 [dev_pixelpipe] took 0,120 secs (0,479 CPU) [export] processed `colorin' on CPU, blended on CPU
21,8728 [dt_ioppr_transform_image_colorspace] IOP_CS_LAB-->IOP_CS_RGB took 0,127 secs (0,507 CPU) [channelmixerrgb]
23,6741 [dev_pixelpipe] took 1,929 secs (7,711 CPU) [export] processed `channelmixerrgb' on CPU, blended on CPU
23,7887 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0,115 secs (0,458 CPU) [atrous]
42,1723 [dev_pixelpipe] took 18,498 secs (73,065 CPU) [export] processed `atrous' on CPU, blended on CPU
42,3008 [dt_ioppr_transform_image_colorspace] IOP_CS_LAB-->IOP_CS_RGB took 0,128 secs (0,514 CPU) [colorbalancergb]
49,2727 [dev_pixelpipe] took 7,100 secs (28,381 CPU) [export] processed `colorbalancergb' on CPU, blended on CPU
49,4105 [dev_pixelpipe] took 0,138 secs (0,548 CPU) [export] processed `rgblevels' on CPU, blended on CPU
51,7432 [dev_pixelpipe] took 2,333 secs (9,324 CPU) [export] processed `sigmoid' on CPU, blended on CPU
51,8578 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0,115 secs (0,458 CPU) [bilat]
55,3796 [dev_pixelpipe] took 3,636 secs (10,572 CPU) [export] processed `bilat' on CPU, blended on CPU
55,6131 [dev_pixelpipe] took 0,233 secs (0,933 CPU) [export] processed `colorout' on CPU, blended on CPU
55,6908 [resample_plain] took 0,078 secs (0,311 CPU) 1:1 copy/crop of 8065x6046 pixels
55,6909 [dev_pixelpipe] took 0,078 secs (0,311 CPU) [export] processed `finalscale' on CPU, blended on CPU
55,6909 [dev_process_export] pixel pipeline processing took 53,946 secs (208,430 CPU)
57,4236 [export_job] exported to `test_03.jpg'
TEST 3: only little (weak) cores in action (173,7767s total time)
tux@rock5b ~/F/m/~/bench_raw> taskset -c 0-3 darktable-cli setubal.orf setubal.orf.xmp test.jpg --core --disable-opencl -d perf
output file already exists, it will get renamed
this is darktable 4.5.0+733~g5d1159758
copyright (c) 2009-2023 johannes hanika
https://github.com/darktable-org/darktable/issues/new/choose
compile options:
bit depth is 64 bit
normal build
SSE2 optimizations unavailable
OpenMP support enabled
OpenCL support enabled
Lua support enabled, API version 9.2.0
Colord support enabled
gPhoto2 support enabled
G'MIC support disabled (compressed LUTs will not be supported)
GraphicsMagick support enabled
ImageMagick support disabled
libavif support disabled
libheif support disabled
libjxl support disabled
OpenJPEG support enabled
OpenEXR support enabled
WebP support enabled
(darktable-cli:7132): Gtk-WARNING **: 18:27:18.342: gtk_disable_setlocale() must be called before gtk_init()
4,4007 [dt_dev_load_raw] loading the image. took 2,226 secs (2,119 CPU)
4,7368 [export] creating pixelpipe took 0,309 secs (1,137 CPU)
4,7372 [dev_pixelpipe] took 0,000 secs (0,000 CPU) initing base buffer [export]
5,0967 [dev_pixelpipe] took 0,360 secs (1,268 CPU) [export] processed `rawprepare' on CPU, blended on CPU
5,1664 [dev_pixelpipe] took 0,070 secs (0,069 CPU) [export] processed `temperature' on CPU, blended on CPU
5,6225 [dev_pixelpipe] took 0,456 secs (1,776 CPU) [export] processed `highlights' on CPU, blended on CPU
5,8664 [dev_pixelpipe] took 0,244 secs (0,965 CPU) [export] processed `hotpixels' on CPU, blended on CPU
8,7245 [dev_pixelpipe] took 2,858 secs (10,789 CPU) [export] processed `demosaic' on CPU, blended on CPU
53,8828 [dev_pixelpipe] took 45,158 secs (175,254 CPU) [export] processed `denoiseprofile' on CPU, blended on CPU
53,9647 [dev_pixelpipe] took 0,082 secs (0,324 CPU) [export] processed `lens' on CPU, blended on CPU
57,5205 [dev_pixelpipe] took 3,556 secs (14,206 CPU) [export] processed `ashift' on CPU, blended on CPU
57,6104 [dev_pixelpipe] took 0,090 secs (0,358 CPU) [export] processed `exposure' on CPU, blended on CPU
57,9507 [dev_pixelpipe] took 0,340 secs (1,360 CPU) [export] processed `colorin' on CPU, blended on CPU
58,5306 [dt_ioppr_transform_image_colorspace] IOP_CS_LAB-->IOP_CS_RGB took 0,580 secs (2,318 CPU) [channelmixerrgb]
63,2726 [dev_pixelpipe] took 5,322 secs (21,277 CPU) [export] processed `channelmixerrgb' on CPU, blended on CPU
63,6379 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0,365 secs (1,457 CPU) [atrous]
125,1997 [dev_pixelpipe] took 61,927 secs (244,942 CPU) [export] processed `atrous' on CPU, blended on CPU
125,7735 [dt_ioppr_transform_image_colorspace] IOP_CS_LAB-->IOP_CS_RGB took 0,574 secs (2,294 CPU) [colorbalancergb]
145,8946 [dev_pixelpipe] took 20,695 secs (82,762 CPU) [export] processed `colorbalancergb' on CPU, blended on CPU
146,6011 [dev_pixelpipe] took 0,706 secs (2,809 CPU) [export] processed `rgblevels' on CPU, blended on CPU
152,7661 [dev_pixelpipe] took 6,165 secs (24,653 CPU) [export] processed `sigmoid' on CPU, blended on CPU
153,1321 [dt_ioppr_transform_image_colorspace] IOP_CS_RGB-->IOP_CS_LAB took 0,366 secs (1,462 CPU) [bilat]
167,5822 [dev_pixelpipe] took 14,816 secs (48,811 CPU) [export] processed `bilat' on CPU, blended on CPU
168,6938 [dev_pixelpipe] took 1,112 secs (4,411 CPU) [export] processed `colorout' on CPU, blended on CPU
168,7740 [resample_plain] took 0,080 secs (0,319 CPU) 1:1 copy/crop of 8065x6046 pixels
168,7741 [dev_pixelpipe] took 0,080 secs (0,321 CPU) [export] processed `finalscale' on CPU, blended on CPU
168,7741 [dev_process_export] pixel pipeline processing took 164,037 secs (636,375 CPU)
173,7767 [export_job] exported to `test_04.jpg'
As you can see from the TEST 2 and TEST 3 results, the liitle (weak) cores are roughly only 3 times slower then the BIG (strong) cores. Not bad I would tell. But their contribution to the common effort in the TEST 1 is almost non-existent. Really interesting. Does anybody have any idea why?
BTW: If anybody interested, here is the “bonus test”. All cores and OpenCL active.
26,8857s total time.
tux@rock5b ~/F/m/~/bench_raw> taskset -c 0-7 darktable-cli setubal.orf setubal.orf.xmp test.jpg --core -d perf
output file already exists, it will get renamed
this is darktable 4.5.0+733~g5d1159758
copyright (c) 2009-2023 johannes hanika
https://github.com/darktable-org/darktable/issues/new/choose
compile options:
bit depth is 64 bit
normal build
SSE2 optimizations unavailable
OpenMP support enabled
OpenCL support enabled
Lua support enabled, API version 9.2.0
Colord support enabled
gPhoto2 support enabled
G'MIC support disabled (compressed LUTs will not be supported)
GraphicsMagick support enabled
ImageMagick support disabled
libavif support disabled
libheif support disabled
libjxl support disabled
OpenJPEG support enabled
OpenEXR support enabled
WebP support enabled
(darktable-cli:7904): Gtk-WARNING **: 19:03:21.727: gtk_disable_setlocale() must be called before gtk_init()
arm_release_ver: g13p0-01eac0, rk_so_ver: 3
1,7998 [dt_dev_load_raw] loading the image. took 1,007 secs (0,997 CPU)
1,9080 [export] creating pixelpipe took 0,101 secs (0,707 CPU)
1,9082 [dev_pixelpipe] took 0,000 secs (0,000 CPU) initing base buffer [export]
2,0274 [dev_pixelpipe] took 0,119 secs (0,753 CPU) [export] processed `rawprepare' on GPU, blended on GPU
2,1297 [dev_pixelpipe] took 0,102 secs (0,457 CPU) [export] processed `temperature' on GPU, blended on GPU
2,2163 [dev_pixelpipe] took 0,087 secs (0,002 CPU) [export] processed `highlights' on GPU, blended on GPU
2,4714 [dev_pixelpipe] took 0,255 secs (0,857 CPU) [export] processed `hotpixels' on CPU, blended on CPU
3,8317 [dev_pixelpipe] took 1,360 secs (1,293 CPU) [export] processed `demosaic' on GPU, blended on GPU
8,3736 [dev_pixelpipe] took 4,542 secs (0,161 CPU) [export] processed `denoiseprofile' on GPU, blended on GPU
8,6581 [dev_pixelpipe] took 0,284 secs (0,176 CPU) [export] processed `lens' on GPU, blended on GPU
9,1307 [dev_pixelpipe] took 0,473 secs (0,000 CPU) [export] processed `ashift' on GPU, blended on GPU
9,1906 [dev_pixelpipe] took 0,060 secs (0,000 CPU) [export] processed `exposure' on GPU, blended on GPU
9,2563 [dev_pixelpipe] took 0,066 secs (0,000 CPU) [export] processed `colorin' on GPU, blended on GPU
9,4346 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0,178 secs (0,002 GPU) [channelmixerrgb]
9,7364 [dev_pixelpipe] took 0,480 secs (0,152 CPU) [export] processed `channelmixerrgb' on GPU, blended on GPU
9,7379 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0,001 secs (0,000 GPU) [atrous]
14,8695 [dev_pixelpipe] took 5,133 secs (0,302 CPU) [export] processed `atrous' on GPU, blended on GPU
14,9223 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_LAB-->IOP_CS_RGB took 0,052 secs (0,000 GPU) [colorbalancergb]
15,7401 [dev_pixelpipe] took 0,871 secs (0,147 CPU) [export] processed `colorbalancergb' on GPU, blended on GPU
15,8587 [dev_pixelpipe] took 0,119 secs (0,000 CPU) [export] processed `rgblevels' on GPU, blended on GPU
15,9803 [dev_pixelpipe] took 0,122 secs (0,000 CPU) [export] processed `sigmoid' on GPU, blended on GPU
16,1647 [dt_ioppr_transform_image_colorspace_cl] IOP_CS_RGB-->IOP_CS_LAB took 0,184 secs (0,002 GPU) [bilat]
24,2461 [dev_pixelpipe] took 8,266 secs (30,792 CPU) [export] processed `bilat' on CPU, blended on CPU
24,8687 [dev_pixelpipe] took 0,623 secs (0,123 CPU) [export] processed `colorout' on GPU, blended on GPU
24,8691 [resample_cl] took 0,000 secs (0,000 CPU) 1:1 copy/crop of 8065x6046 pixels
25,0275 [dev_pixelpipe] took 0,159 secs (0,149 CPU) [export] processed `finalscale' on GPU, blended on GPU
25,1890 [dev_process_export] pixel pipeline processing took 23,281 secs (35,469 CPU)
26,8857 [export_job] exported to `test_05.jpg'
Yeah, this is a strange result, maybe darktable doesn’t handle big little well? I remember seeing these kinds of tests for the new intel cpus and the efficiency cores were quite useful. If they weren’t intel wouldn’t be selling the 13900k and instead everyone would buy the 13700k. The clock speeds don’t justify the increase in multithreaded performance over both cpus.
One big difference between the cores is that the little ones lack (certain?) vector units. Could it be that an executable can either run in vector mode, or not, but not both? I’ve never heard of that, as it would be a rather big deal for performance if true, but something like this seems to be happening. Or maybe there are shared vector units on the big cores, so any vevtorized code is still limited by them?
Wow, thank you to all replies! That was much more input than I would have hoped for. I can choose much more wisely now as darktable really is the only performant task my PC has to master. This is truly of great value to me.