darktable and OpenCL (updated)


(system) #1

Many readers will have already heard about GPU processing and the fact that darktable can make use of OpenCL to improve performance. As we still lack a detailed documentation of that topic, please find here a few explanations and howtos.

The Background

Processing high resolution images belongs to the more demanding tasks in modern computing. Both, in terms of memory requirements and in terms of CPU power, getting the best out of a typical 15, 20 or 25 Megapixel image can quickly bring your computer to its limits.

darktable’s requirements are no exception. Our decision to not compromise processing quality, has led to all calculations being done on 4 × 32bit floating point numbers. This is slower than “ordinary” 8 or 16bit integer algebra, but eliminates all problems of tonal breaks or loss of information.

A lot of hand optimization has been invested to make darktable as fast as possible. If you run a current version of darktable on a modern computer, you might not even notice any “slowness”. However, there are conditions and certain modules where you feel (or hear from the howling of your CPU fan) how much your poor multi-core processor has to struggle.

That’s where OpenCL comes in. OpenCL allows us to take advantage of the enormous power of modern graphics cards. It has been gamer’s demand for more and more highly detailed 3D worlds in modern ego shooters, that has fostered GPU development. ATI, NVIDIA and Co had to put enormous FPU processing power into their GPUs to meet these demands. The result is modern graphics cards with highly parallelized GPUs to quickly calculate surfaces and textures at high frame rates.

You are not a gamer and you don’t take advantage of that power? Well, then you should at least use it in darktable!

For the task of highly parallel floating point calculations modern GPUs are much faster than CPUs. That is especially true, when you want to do the same few processing steps over millions of items. Typical use case: processing of megapixel images.

How OpenCL works

As you can imagine, hardware architectures of GPUs can vary significantly. There are different producers, and even different generations of GPUs from the same producer may differ clearly. At the same time GPU manufacturers are normally not willing to disclose many hardware details of their products to the public. One of the known consequences is the need to use proprietary drivers under Linux, if you want to take full advantage of your graphics card.

Fortunately an industry consortium lead by The Khronos Group has developed an open, standardized interface called OpenCL. It eases the use of your GPU as a numerical processing device. OpenCL offers a C99-like programming language with a strong focus on parallel computing. An application that wants to use OpenCL will need to bring along a suited OpenCL source code that it then hands over to a hardware specific OpenCL compiler at run-time. This way the application can use OpenCL on different GPU architectures (even at the same time). All “hardware secrets” are hidden in this compiler and are normally not visible to the user (or the application). The compiled OpenCL code is loaded onto your GPU and – with certain API calls – it is ready to do calculations for you.

How to activate OpenCL in darktable

Using OpenCL in darktable requires that your PC is equipped with a suitable graphics card and that it has the required libraries in place. Namely modern graphics cards from NVIDIA and ATI come with full OpenCL support. The OpenCL compiler is normally shipped as part of the proprietary graphics driver; it is reachable as a dynamic library called “libOpenCL.so”. This library must be in a folder where it is found by your system’s dynamic linker.

When darktable starts, it will first try to find and load libOpenCL.so and – on success – check if the available graphics card comes with OpenCL support. A sufficient amount of graphics memory (1GB+) needs to be available to take advantage of the GPU. If that is OK, darktable tries to setup its OpenCL environment: a processing context needs to be initialized, a calculation pipeline to be started, OpenCL source code files (extension is .cl) need to be read and compiled and the included routines (called OpenCL kernels) need to be prepared for DT’s modules. If all that is done, the preparation is finished.

As we still regard darktable’s OpenCL support as experimental, we require the user in addition to positively activate OpenCL. Go into the preferences dialog and look for core options. Here you find a checkbox that says: “activate opencl support (experimental)”. Check that box and from that on OpenCL is used by darktable.

You can at any time switch it off and on again. Depending on the type of modules you are using, you will notice the effect as a general speed-up during interactive work and during export. Not all modules can take advantage of OpenCL at the moment and not all modules are demanding enough to make a noticeable difference. In order to feel a real difference, take modules like “shadows & highlights”, “sharpen”, “lowpass”, “highpass” or as an extreme case “equalizer”.

Let’s have a look at an example. I took an image of 20 MPx and processed it with a typical history stack for my way of working. This covers modules equalizer, tone curve, highpass and sharpen.

My computer is equipped with an i7-2600 CPU and an NVIDIA GeForce GTS 450 graphics card with 1GB memory. Core memory is 16GB.

For a single run of my pixelpipe in interactive mode (so called “full” pipeline), I get the following figures:

OpenCL not activated 0.76 seconds OpenCL activated 0.11 seconds

This would be the typical delay, if you change a parameter or if you pan or zoom into the image.

With the same image and the same settings, I profiled the export pixelpipe when generating a JPEG file with full resolution. Here are the results:

OpenCL not activated 25.2 seconds OpenCL activated 6.5 seconds

If you are interested in more profiling figures, you can call darktable with command line parameters -d opencl -d perf. After each run of the pixelpipe you will get a detailed allocation of processing time to each module plus an even more fine grained profile for all used OpenCL kernels.

Besides the speed-up you should not see any difference in the results between CPU and GPU processing. Except of rounding errors, the results are designed to be identical. If, for some reasons, darktable fails to properly finish a GPU calculation, it will normally notice and automatically (and transparently) fall back to CPU processing.

Possible Problems and Solutions

If severe OpenCL errors occur at run-time, or the setup of our OpenCL environment fails during initialization, OpenCL will be automatically deactivated. You will notice if you open the preferences dialog and the activation checkbox has been reset to “off”.

There can be various reasons why OpenCL failed. We depend on hardware requirements and on the presence of certain drivers and libraries. In addition all these have to fit in terms of maker model and revision number. If anything does not fit, e.g. your graphics driver (loaded as a kernel module) does not match the version of your libOpenCL.so, OpenCL support is likely to fail and CPU is taking over.

In that case, the best thing to do is start darktable from a console with

This will give additional debugging output about the initialization and use of OpenCL. First see if you find a line that starts with “[opencl_init] FINALLY …” This should tell you, if OpenCL support is available for you or not. If initialization failed, look at the messages above for anything that reads like “could not be detected” or “could not be created”. Check if there is a hint about where it failed.

Here are a few cases observed in the past:

DT might tell you that no OpenCL aware graphics card is detected or that the available memory on your GPU is too low and the device is discarded. In that case you might need to buy a new card, if you really want OpenCL support.

DT might also tell you that a context could not be created. This often indicates a version mismatch between (loaded) graphics driver and libOpenCL. Check if you have left-over kernel modules or graphics libraries of an older install and take appropriate action. In doubt, make a clean reinstall of your graphics driver. Sometimes, immediately after a driver update, the loaded kernel driver does not match the newly installed libraries: reboot your system in that case.

DT might crash in very rare cases directly during startup. This can happen if your OpenCL setup is completely broken or if driver/library contains a severe bug. If you can’t fix it, you can still use darktable with option —disable-opencl, which will skip the entire OpenCL initialization step.

DT might on some systems fail to compile its OpenCL source files at run-time. In that case you will get a number of error messages looking like typical compiler errors. This could indicate an incompatibility between your OpenCL implementation and our interpretation of the standard. In that case visit us at darktable-devel@sourceforge.net and report the problem. Chances are good that we can help you. Please also report in case you see significant differences between CPU and GPU processing of an image!

There also exist a few on-CPU implementations of OpenCL. These come as drivers provided by INTEL or AMD. We observed that they do not give us any speed gain versus our hand-optimized CPU code. Therefore we simply discard these devices.

Summary

Although OpenCL support in darktable is still experimental and incomplete, it is already very usable. Give it a try and see what it can do for you!

[Update]

Here are a few more words about optimization of your OpenCL setup once it’s running. As a general rule, darktable tries to catch all OpenCL runtime errors and take appropriate action. Therefore OpenCL should normally not cause darktable to crash or give garbled output. Instead, in case of errors, DT will notice and reprocess everything again on CPU; an additional step which could slow-down processing significantly for you! Therefore it is worth investing some effort to avoid those errors.

The most limiting resource for OpenCL is GPU memory. Modern graphics cards might be equipped with 1GB or even 2GB RAM, but this is low compared to core memory and it is not too much if we want to do an export of a high resolution image. One further problem with GPU memory is the fact, that we do not know what is really free. At startup we will read from each OpenCL device the amount of available memory, but we can not take all of it. There is some (unknown) amount which the GPU driver will need for its overhead and for X11 video tasks. Trying to allocate more memory for our purposes than is available at a time will cause allocation failures and the pixelpipe to abort.

darktable’s escape route out of this limitation is “tiling”. Images that are too big are processed in smaller parts (rectangular tiles) one after the other and then combined again. This happens on a per-module basis, i.e. for each module that we want to process, a decision is taken if and how many tiles we will need.

Before going into the details, the above already makes clear that we should not process several images in parallel with OpenCL. We already make maximum use of GPU memory by tiling and the nature of GPU processing will already parallelize processing to the max on a pixel by pixel basis. No room for additional parallelization. In preferences set “export multiple images in parallel” to 1.

When you are running darktable with OpenCL support and if you suspect slow processing (namely during image exports), restart DT from a console with option -d opencl.

Watch out for modules that fail with an error message. Pay special attention to error code -4; this is the error we get when on-GPU memory allocation fails. Module “equalizer” is a hot candidate for this. Sometimes you might get a message on a module failing due to not fulfilled “roi” requests (esp. module “demosaic”). This can be ignored; it is a current darktable limitation but does not indicate any OpenCL problem.

If you get “-4” errors, go into file $HOME/.config/darktable/darktablerc, where DT stores its configuration parameters and look for opencl_memory_headroom. This value tells darktable how many megabytes (out of the totally available amount) should be left free for driver and video purposes. By default it is set to 300MB, which works well with current NVIDIA cards. If you increase this value (steps of 50 are a good choice), you even further reduce danger to run into allocation failures. On the negative side, this requires stronger tiling (more but smaller tiles) which is a bit less efficient. In the end you should rather accept more tiling than more allocation failures!

With current Radeon cards users have observed a different issue. Those cards will often only report to have less available memory than they physically own; typically 512MB out of 1GB. In the first place this will prevent them from being accepted as valuable OpenCL devices by DT (we set a minimum requirement of 768MB). You can change this behavior if you set opencl_memory_requirement to 512. The good news is that Radeon cards seem to have less memory overhead (at least within the reported 512MB). Therefore you can try to set opencl_memory_headroom to a value as low as 150 or even 100. This should leave you with a quite reasonable amount of free GPU memory for OpenCL processing. Give it a try and share your success stories at darktable-users@sourceforge.net.

Share this on: Google+ | Twitter | Facebook


This is a companion discussion topic for the original entry at https://www.darktable.org/2012/03/darktable-and-opencl/

Preference "export multiple images in parallel"
(Martin Scharnke) #2

There also exist a few on-CPU implementations of OpenCL. These come as drivers provided by INTEL or AMD. We observed that they do not give us any speed gain versus our hand-optimized CPU code. Therefore we simply discard these devices.

Does this include GPU on the same die as CPU? My AMD A12 (id Carrizo) comes with 8 compute units according to clinfo, but is not usable?

Again - I have a separate Radeon R5230 (id Caicos) with 2 compute units but this is not used either.

darktable-cltest output pasted below:

Performance using just my CPU (and with 16GB DDR4 RAM) isn’t bad; but it would be nice to have the ability to use the silicon that could make the performance fantastic.

darktable-cltest
[opencl_init] opencl related configuration options:
[opencl_init]
[opencl_init] opencl: 1
[opencl_init] opencl_library: ‘’
[opencl_init] opencl_memory_requirement: 768
[opencl_init] opencl_memory_headroom: 300
[opencl_init] opencl_device_priority: ‘/!0,//
[opencl_init] opencl_mandatory_timeout: 200
[opencl_init] opencl_size_roundup: 16
[opencl_init] opencl_async_pixelpipe: 0
[opencl_init] opencl_synch_cache: 0
[opencl_init] opencl_number_event_handles: 25
[opencl_init] opencl_micro_nap: 1000
[opencl_init] opencl_use_pinned_memory: 0
[opencl_init] opencl_use_cpu_devices: 0
[opencl_init] opencl_avoid_atomics: 0
[opencl_init]
[opencl_init] found opencl runtime library ‘libOpenCL’
[opencl_init] opencl library ‘libOpenCL’ found on your system and loaded
amdgpu_parse_asic_ids: Cannot parse ASIC IDs: Resource temporarily unavailable
[opencl_init] found 1 platform
[opencl_init] found 2 devices
[opencl_init] discarding device 0 AMD CAICOS (DRM 2.50.0 / 4.13.0-21-lowlatency, LLVM 6.0.0)' due to missing image support. [opencl_init] discarding device 1AMD CARRIZO (DRM 3.18.0 / 4.13.0-21-lowlatency, LLVM 6.0.0)’ due to missing image support.
[opencl_init] no suitable devices found.
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.


(Roman Lebedev) #3

There is not enough opencl support in open drivers.
https://dri.freedesktop.org/wiki/GalliumCompute/ says {2,3}D image {read, write} are all still TODO.


(Martin Scharnke) #4

Ah … thanks for this. NVIDIA closed source driver on my laptop worked - mostly. Is there an opencl solution using closed source drivers for AMD GPU?


(Mica) #5

The AMD GPU drivers are actually quite bad, aren’t they? A newer kernel should have the open drivers included


#6

Hi
My output using darktable -d opencl

[opencl_init] opencl related configuration options:
[opencl_init]
[opencl_init] opencl: 1
[opencl_init] opencl_library: ‘’
[opencl_init] opencl_memory_requirement: 768
[opencl_init] opencl_memory_headroom: 300
[opencl_init] opencl_device_priority: ‘/!0,//
[opencl_init] opencl_mandatory_timeout: 200
[opencl_init] opencl_size_roundup: 16
[opencl_init] opencl_async_pixelpipe: 0
[opencl_init] opencl_synch_cache: 0
[opencl_init] opencl_number_event_handles: 25
[opencl_init] opencl_micro_nap: 1000
[opencl_init] opencl_use_pinned_memory: 0
[opencl_init] opencl_use_cpu_devices: 0
[opencl_init] opencl_avoid_atomics: 0
[opencl_init]
[opencl_init] could not find opencl runtime library ‘libOpenCL’
[opencl_init] could not find opencl runtime library ‘libOpenCL.so
[opencl_init] found opencl runtime library ‘libOpenCL.so.1’
[opencl_init] opencl library ‘libOpenCL.so.1’ found on your system and loaded
[opencl_init] could not get platforms: -1001
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.
wait time 0.167082s
wait time 0.108670s

I’m using A10-7870K APU with built in R7 graphics. Does this mean OpenCL support is not available?


(Roman Lebedev) #7

That is what it told you.


(Andreas Schneider) #8

My system is using the Open Source AMD GPU driver. However for darktable I use the closed source opencl compiler. I’ve just extracted the required individual files from the closed source driver packages (RPM) for that:

$ find /opt/amdgpu-pro/lib64 -type f
/opt/amdgpu-pro/lib64/libOpenCL.so.1
/opt/amdgpu-pro/lib64/libamdocl12cl64.so
/opt/amdgpu-pro/lib64/libamdocl64.so
/opt/amdgpu-pro/lib64/libdrm.so.2.4.0
/opt/amdgpu-pro/lib64/libdrm_amdgpu.so.1.0.0
/opt/amdgpu-pro/lib64/libdrm_radeon.so.1.0.1
/opt/amdgpu-pro/lib64/libkms.so.1.0.0

and

cat /etc/OpenCL/vendors/amdocl64.icd
libamdocl64.so

This works fine with:

LD_LIBRARY_PATH=/opt/amdgpu-pro/lib64 darktable

to get OpenCL support it darktable.

P.S.: Midnight Commander is able to browse RPM files, so it is quite easy to copy out files from packages with it.


#9

Thanks. I was hoping there was a work around I was missing as exporting jpg files can be slow.


(Martin Scharnke) #10

Andreas, could you please elaborate a little?

I trashed my driver installation trying to set up amdgpu-pro and had to reinstall my OS files, and wish to avoid going through that pain again.

Did you use the AMD supplied script, or (as I think you imply) selectively install just some of the .deb packages?
Or did you not install .deb packages but merely extract the files you list in /opt/amdgpu-pro/lib64 fom the packages?

Your last line suggests having to call darktable from a terminal. Is it possible to set LD_LIBRARY_PATH in my envionment, or does it need to be set as a commandline option?


(Andreas Schneider) #11

I did not install any package. I extracted the mentioned individual files from the packages (RPM). Midnight Command can simply open them …

I start darktable on the commandline. You can also add it to the desktop file …


(Daniel Baak) #12

Hi All. I’m having issues with getting darktable to use opencl on my system. After having reinstalled my nvidia-387 driver.

I’m getting the following output:

➜  ~  darktable -d opencl  
[opencl_init] opencl related configuration options:
[opencl_init] 
[opencl_init] opencl: 1
[opencl_init] opencl_library: ''
[opencl_init] opencl_memory_requirement: 768
[opencl_init] opencl_memory_headroom: 4000
[opencl_init] opencl_device_priority: '*/!0,*/*/*'
[opencl_init] opencl_mandatory_timeout: 200
[opencl_init] opencl_size_roundup: 16
[opencl_init] opencl_async_pixelpipe: 0
[opencl_init] opencl_synch_cache: 0
[opencl_init] opencl_number_event_handles: 25
[opencl_init] opencl_micro_nap: 1000
[opencl_init] opencl_use_pinned_memory: 0
[opencl_init] opencl_use_cpu_devices: 0
[opencl_init] opencl_avoid_atomics: 0
[opencl_init] 
[opencl_init] found opencl runtime library 'libOpenCL'
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] could not get device id size: -1
[opencl_init] found 0 device
[opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
[opencl_init] initial status of opencl enabled flag is OFF.

The device it’s not detecting is a nvidia GeForce 1070 GTX. The driver is functional in so far as it’s running cuda code correctly.

The system is linuxmint 18.2.

Any advice in how to debug this problem is greatly appreciated.


(Martin Scharnke) #13

Finally managed to get opencl support in darktable;

@asn, although I made progress following your tips, ultimately I got stalled on version number of the DRM.

I bit the bullet and installed amdgpu-pro version 17.50 using the install script, which built a new kernel and ramdisk image.
I followed the directions on this page

The speed-up factor in use on darktable is only slight over 2, however, which is considerably less than the factor I achieved using my corei7 CPU laptop with opencl. I don’t know the exact factor, but I do know that I was exporting a jpg for a 4000x6000 image in about 8 seconds. Unfortunately, exports using my new computer are still taking nearly a minute each.

Interestingly my stand-alone video card (yes a pretty low-spec Radeon R5 230 was not detected as a usable device at all. Given it has but two GPU units (versus 8 on the CPU die - I have an AMD A12) means this is a moot point.

Mileage _will vary, but maybe this tale will help someone with similar hardware & OS version (Ubuntu Studio 17.10)


(Andreas Schneider) #14

I have a Radeon RX 470, so the GPU is pretty strong. I needed to set ‘OpenCL scheduling profile: Very fast GPU’. With that everything is working much faster :slight_smile:


(Martin Scharnke) #15

Thanks for the tip. This has given me an extra 2x speedup. :slight_smile:
But when I set “multiple GPUs” I get much better … another 4x speedup.
Very happy with the results now!!! :smiley: :smiley:


(Paolo Benvenuto) #16

I’m getting the same result:

$ darktable -d opencl

** (darktable:13761): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-qHQglvLNCi: Connessione rifiutata
7.369256 [opencl_init] opencl related configuration options:
7.369289 [opencl_init] 
7.369294 [opencl_init] opencl: 1
7.369299 [opencl_init] opencl_library: ''
7.369312 [opencl_init] opencl_memory_requirement: 768
7.369320 [opencl_init] opencl_memory_headroom: 300
7.369327 [opencl_init] opencl_device_priority: '*/!0,*/*/*'
7.369339 [opencl_init] opencl_mandatory_timeout: 200
7.369349 [opencl_init] opencl_size_roundup: 16
7.369357 [opencl_init] opencl_async_pixelpipe: 0
7.369362 [opencl_init] opencl_synch_cache: 0
7.369371 [opencl_init] opencl_number_event_handles: 25
7.369381 [opencl_init] opencl_micro_nap: 1000
7.369386 [opencl_init] opencl_use_pinned_memory: 0
7.369395 [opencl_init] opencl_use_cpu_devices: 0
7.369405 [opencl_init] opencl_avoid_atomics: 0
7.369413 [opencl_init] 
7.369671 [opencl_init] could not find opencl runtime library 'libOpenCL'
7.369760 [opencl_init] could not find opencl runtime library 'libOpenCL.so'
7.370507 [opencl_init] found opencl runtime library 'libOpenCL.so.1'
7.370562 [opencl_init] opencl library 'libOpenCL.so.1' found on your system and loaded
7.370896 [opencl_init] could not get platforms: -1001
7.370903 [opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
7.370906 [opencl_init] initial status of opencl enabled flag is OFF.

What does “could not get platforms: -1001” mean?


(Martin Scharnke) #17

Hi Paolo,

did you download and install the amdgpu-pro package?
It is a bit of a “heart in your mouth” moment, because for me there is the fear of making my system unusable, and hard to return to its previous state.
However I have had success in Ubuntu and Manjaro with the -pro package. It is, unfortunately, not FOSS.
I’m thinking about trying the latest (update as at 20 March 2018) here because I do have some flicker issues that - once they start (sometimes after half an hour, sometimes never) - only a reboot solves.


(Martin Scharnke) #18

Sorry, an actual answer to your question:

Blockquote
What does “could not get platforms: -1001” mean?

-1001 is an error code, and it means that opencl_init was unable to find a GPU enabled for opencl :sob:


(Paolo Benvenuto) #19

I’m thinking about trying the latest (update as at 20 March 2018) here

That’s the “Radeon Pro Software Enterprise Edition”. The support site isn’t clear about what that means: The page https://support.amd.com/en-us/download/workstation?os=Linux%20x86_64#release-notes says:

Radeon Pro Software Enterprise Edition 18.Q1.1 for Linux Highlights

Radeon™ Pro Software Enterprise Edition 18.Q1.1 for Linux delivers enterprise-level support for RHEL 7.4 and CentOS 7.4.

​Download Full Release Not​e​s

Radeon Pro Software Adrenalin Edition 17.12.1 for Linux Highlights

Radeon™ Pro Software Adrenalin Edition 17.12.1 for Linux delivers amdgpu-pro and amdgpu-open stack using the same packaging infrastructure and introduces support for RHEL 7.4 and CentOS 7.4.

Download Full Release Notes

I cannot understand if I must use “Radeon Pro Software Enterprise Edition 18.Q1.1” or “Radeon Pro Software Adrenalin Edition 17.12.1”, they do not explain what is each for. I actually trying installing both, the first makes my system unusable, the second works but doesn’t make openocl works…


(Andreas Schneider) #20