I’ve just found these 6-year-old settings, and am wondering about them. I’ve disabled OpenCL, cleared the cache and exported the same set of 12 random images, some having just some default steps, others much more resource intensive. I tried all 4 combinations, and got results that are 1% within each-other.
codepaths/sse2 , default: true, description: ‘enable usage of SSE2-optimized codepaths’ codepaths/openmp_simd: default: false, description: ‘enable usage of OpenMP SIMD codepaths. if enabled, and such codepath exists, it will have the highest priority’
Any recommendations? Do they perhaps not influence export, but the darkroom? Or the effect is simply so little that I shouldn’t bother? In that case, does it make sense to keep them in the codebase? I know that there have been recent commits indicating moving away from SSE.
SIMD is SSE, or rather, SSE is SIMD. SSE2 code paths are manually vectorized code. OpenMP SIMD code paths are automatically vectorized code by the compiler. Both allow to process Multiple Data in a Single Instruction, leading to substantial speed-ups where that logic can be used.
Recent work on color conversions in the pipeline (to go from/to Lab to/from RGB) have shown that manually vectorized SSE code was slower than automagically vectorized OpenMP SIMD code. The reason is probably that the auto way adapts better to the CPU cache size/SSE generation heuristics. But bear in mind that color conversions are straightforward matrice/vector dot products, for which SSE stuff is designed.
That started several checks on different modules to see if manual SSE2 code brought something more, and it was removed in cases that have been proven slower than OpenMP code. However, this behaviour is not systematic so SSE2 code paths are kept in some places and there is no reason to override them with the OpenMP SIMD pathes.
Also, self-building and using a march=generic packaged build will make a difference here and the target_clones are not fully functional and barely tested.
Also SSE2 is Pentium IV generation (2004) and SSE2 intrinsics are re-interpreted as AVX or AVX2 when available on modern platforms anyway, so SSE2 is a misleading name in any case.
Thanks for the info, but that’s not exactly what I was looking for – I know what SIMD is.
Any practical suggestions on those two options? Turn them both on, and they may help in certain cases? Or should OpenMP left at the default ‘false’ value? Should I have seen some difference in my experiments?
My build info, according to darktable --version:
this is darktable 3.9.0+74~gff44aa88e
copyright (c) 2009-2022 johannes hanika darktable-dev@lists.darktable.org
compile options:
bit depth is 64 bit
normal build
SSE2 optimized codepath enabled
OpenMP support enabled
OpenCL support enabled
Lua support enabled, API version 8.0.0
Colord support enabled
gPhoto2 support enabled
GraphicsMagick support enabled
ImageMagick support disabled
OpenEXR support enabled
I build with ./build.sh --prefix /home/kofa/darktable-master --build-type Release --install.
From what AP said I believe as long as you don’t do a generic build then it should be as fast as it possibly can be…I don’t think the build script is generic so I think you are okay but maybe I misunderstood…
Thanks. So, they are not specific to the darkroom/interactive editing (in other words: the reason I didn’t see a difference in my export experiments is not that they don’t affect export; rather, they don’t affect anything, at least not with my locally built and optimised setup).
I interpret @priort’s response to imply that those settings would actually affect generic builds (triggered by BINARY_PACKAGE_BUILD (march=generic, mtune=generic). Is that the case?