darktable uses CPU instead of GPU

kofa · September 12, 2023, 7:48pm

tune OpenCL performance: memory size: I’m not sure if any tuning is recommended nowadays.

Also, I don’t know if you have discovered this, but opencl_device_priority is only used if opencl_scheduling_profile=default.

multiple GPUs
[…] Users of systems with a variety of GPUs will need better control on their relative priority. They would be better off selecting the “default” profile and fine-tuning their system with the “opencl_device_priority” configuration parameter (see multiple devices).
(darktable 4.6 user manual - scheduling profile)

darktable 4.6 user manual - multiple devices is, however, slightly outdated. It says:

There are four fields in the parameter string separated by a slash, each representing one type of pixelpipe. a,b,c... defines the devices that are allowed to process the center image (full) pixelpipe. Likewise devices k,l,m... can process the preview pixelpipe, devices o,p,q... the export pixelpipes and finally devices x,y,z... the thumbnail pixelpipes.

In reality, there are 5 such fields, because the 5th one is the preview on the 2nd screen:

github.com

darktable-org/darktable/blob/master/src/common/opencl.c#L1884-L1897


      
            g_strfreev(tokens);
          
            // terminate priority list with -1
            while(count < devs + 1) priority_list[count++] = -1;
          
            // opencl use can only be mandatory if at least one opencl device is given
            *mandatory = (priority_list[0] != -1) ? mnd : 0;
          
            free(full);
          }
          
          // parse a complete priority string
          static void dt_opencl_priorities_parse(dt_opencl_t *cl, const char *configstr)
          {

I simply set them as mandatory on my GPU:
opencl_device_priority=+0/+0/+0/+0/+0

One other item I found worth setting is:
opencl_mandatory_timeout=20000
When darktable wants to start a new operation, it will check if a GPU is available. If the card is busy, darktable will not wait for it to become available forever, even if mandatory GPU processing is set via the priorities: it checks opencl_mandatory_timeout, takes its value, multiplies it by 5, and interprets that as the number of milliseconds to wait. If the card becomes available, the processing will use it; otherwise, it falls back on the CPU. I found that with computationally intensive modules (like diffuse or sharpen) the default timeout (400, or 2 seconds) was too short, causing:

a wait for 2 seconds for the card to become available;
then processing on the CPU, rather slowly. During that time, the GPU finishes its previous task and sits around, unutilised.

For me, it is worth waiting for much longer (max. 100 seconds – yes, that’s an overkill), so the GPU can always finish processing, because it is simply so much faster than my CPU that a typical 5 - 10 second wait for the GPU, followed by let’s say 10 seconds of GPU processing is less than the 20-60 second processing on the CPU. But the best value of this setting really depends on your GPU to CPU speed-up.