OpenCL activation slows down darktable

I’ve just benchmarked darktable with and without opencl. Surprisingly I got a slowdown when opencl is activated.
What I did was the following. I took a sample RAW from here: Darktable Benchmark - OpenBenchmarking.org

Then I run following script:
[marco@marco-pc benchmar] cat benchmark.sh rm -f test*.jpg for d in (seq 1 3); do
echo -n "run $d: "
darktable-cli bench.SRW test-$d.jpg --core --configdir /tmp --disable-opencl -d perf -d opencl | grep “processing took”
done

After that I changed that script so that opencl is activated:
[marco@marco-pc benchmar] cat benchmark-mit-opencl.sh rm -f test*.jpg for d in (seq 1 3); do
echo -n "run $d: "
darktable-cli bench.SRW test-$d.jpg --core --configdir /tmp -d perf -d opencl | grep “processing took”
done

See my results, dt is quicker when opencl is not activated:
benchmark-orig.txt (1.2 KB)

Did I do something wrong? I thought my system is properly configured. Maybe not?!

I’m on Manjaro 5.4.31, AMD Ryzen 9 3900X & Radeon RX 570 using dt 3.0.1

[marco@marco-pc benchmar] clinfo |grep Image Image support Yes Image support No [marco@marco-pc benchmar] darktable-cltest |grep -i finally
0.280783 [opencl_init] FINALLY: opencl is AVAILABLE on this system.

The output of clinfo and darktable-cltest here:
clinfo.txt (13.8 KB) darktable-cltest.txt (41.0 KB)

I thought opencl improves generally the performance?

What a nice CPU you have, @wiegemalt :slight_smile:
The main difference between your rig and mine is that
my GFX is an Nvidia card (GTX-1660 Ti), and here,
a certain darktable bench mark with openCL takes 2.781 seconds,
and without openCL it takes 8.521 seconds.

I do not think it is of importance, but how much RAM do you have?
Swap partition?

Have fun!
Claes in Lund, Sweden

I have 32GB Ram and the same amount of swap…
Did you use the same RAW for the speed test?

Yes, I use the same start image.

More info here:
https://math.dartmouth.edu/~sarunas/darktable_bench.html

And these are the two commands to run the tests:

darktable-cli bench.SRW test.jpg --core -d perf -d opencl
darktable-cli bench.SRW test.jpg --core --disable-opencl -d perf -d opencl

marco@marco-pc benchmar]$ darktable-cli bench.SRW bench.SRW.xmp test.jpg --core -d perf -d opencl >opencl-enabled.txt
Ausgabedatei existiert bereits, sie wird umbenannt

(darktable-cli:8033): Gtk-WARNING **: 17:50:30.903: gtk_disable_setlocale() must be called before gtk_init()
[export_job] exported to `test_04.jpg’
[marco@marco-pc benchmar]$ darktable-cli bench.SRW bench.SRW.xmp test.jpg --core --disable-opencl -d perf >opencl-disabled.txt
Ausgabedatei existiert bereits, sie wird umbenannt

(darktable-cli:8130): Gtk-WARNING **: 17:50:37.480: gtk_disable_setlocale() must be called before gtk_init()
[export_job] exported to `test_05.jpg’

[marco@marco-pc benchmar]$ tail opencl-*
==> opencl-disabled.txt <==
0,370788 [export] creating pixelpipe took 0,034 secs (0,343 CPU)
0,370826 [dev_pixelpipe] took 0,000 secs (0,000 CPU) initing base buffer [export]
0,377320 [dev_pixelpipe] took 0,006 secs (0,099 CPU) processed raw black/white point' on CPU, blended on CPU [export] 0,384561 [dev_pixelpipe] took 0,007 secs (0,038 CPU) processed white balance’ on CPU, blended on CPU [export]
0,388656 [dev_pixelpipe] took 0,004 secs (0,080 CPU) processed highlight reconstruction' on CPU, blended on CPU [export] 0,481917 [dev_pixelpipe] took 0,093 secs (1,137 CPU) processed demosaic’ on CPU, blended on CPU [export]
0,505048 [dev_pixelpipe] took 0,023 secs (0,357 CPU) processed input color profile' on CPU, blended on CPU [export] 0,537280 [dev_pixelpipe] took 0,032 secs (0,786 CPU) processed output color profile’ on CPU, blended on CPU [export]
0,551240 [dev_pixelpipe] took 0,014 secs (0,318 CPU) processed `gamma’ on CPU, blended on CPU [export]
0,551260 [dev_process_export] pixel pipeline processing took 0,180 secs (2,815 CPU)

==> opencl-enabled.txt <==
0,668396 [dev_pixelpipe] took 0,000 secs (0,000 CPU) initing base buffer [export]
0,673622 [dev_pixelpipe] took 0,005 secs (0,002 CPU) processed raw black/white point' on CPU, blended on CPU [export] 0,680868 [dev_pixelpipe] took 0,007 secs (0,052 CPU) processed white balance’ on CPU, blended on CPU [export]
0,684987 [dev_pixelpipe] took 0,004 secs (0,080 CPU) processed highlight reconstruction' on CPU, blended on CPU [export] 0,770999 [dev_pixelpipe] took 0,086 secs (1,078 CPU) processed demosaic’ on CPU, blended on CPU [export]
0,793531 [dev_pixelpipe] took 0,022 secs (0,269 CPU) processed input color profile' on CPU, blended on CPU [export] 0,825976 [dev_pixelpipe] took 0,032 secs (0,784 CPU) processed output color profile’ on CPU, blended on CPU [export]
0,840004 [dev_pixelpipe] took 0,014 secs (0,319 CPU) processed `gamma’ on CPU, blended on CPU [export]
0,840025 [dev_process_export] pixel pipeline processing took 0,172 secs (2,583 CPU)
1,148763 [opencl_summary_statistics] device ‘Ellesmere’ (0): NOT utilized
[marco@marco-pc benchmar]$

Could it have something to do with that error message?

darktable-cli:8130): Gtk-WARNING **: 17:50:37.480: gtk_disable_setlocale() must be called before gtk_init()

I’m a bit confused now. I see that for both cases all modules are processed in the cpu.
Shouldn’t all modules be done by the gpu in case of activated opencl?

But then: Isn’t the question why is my system that fast even without opencl?

Abends!

Please, just run this command

darktable-cli bench.SRW test.jpg --core -d perf -d opencl

and note what values you get:

pixel pipeline processing took x.xxxx secs (y.yyyy CPU)

Then perform next command, and note the values

darktable-cli bench.SRW test.jpg --core --disable-opencl -d perf -d opencl

AND: if you pipe it somewhere, watch out for

Ausgabedatei existiert bereits, sie wird umbenannt

… so that you do not look in the wrong file :slight_smile:

MfG
Claes in Lund, Schweden

Looking at your last examples, the processing is doing essentially nothing outside the minimum modules. Maybe darktable is not reading the xmp, and it just applies the default modules? (which is very fast and can be done on CPU)

Hmm, I’ve just tried with an alternative xmp file
bench.SRW.xmp (12.8 KB)

darktable-cli bench.SRW test.jpg --core --disable-opencl -d perf -d opencl
→ 9,356293 [dev_process_export] pixel pipeline processing took 8,954 secs (203,253 CPU)

darktable-cli bench.SRW test.jpg --core -d perf -d opencl
→ 9,759251 [dev_process_export] pixel pipeline processing took 9,085 secs (205,608 CPU)

There seems to be not a real difference …
Is there any other way to figure out what is going on?

Your value for without openCL looks plausible for your CPU and GFX.

But not the with openCL clocking :frowning:

This seems more like the expected numbers (at least for OpenCL disabled). Can you post the full output in both cases?

In german there is the saying: The problem is always in front of the computer…
That is also true here: opencl was not activated in dt-gui

Now I see the speedup :slight_smile:
3,790295 [dev_process_export] pixel pipeline processing took 3,155 secs (3,987 CPU)

Anyway thanks for helping!

2 Likes

Affengeil! That is more like it :slight_smile:

1 Like

In the US, we have PEBKAC: problem exists between keyboard and chair :wink:

1 Like