Understanding the bottleneck with drawn masks: CPU rendering vs. GPU blending

Hi everyone,

I’ve been thinking about the performance and usability of drawn masks (brushes, paths) in darktable. While darktable performs incredibly well overall, especially with OpenCL enabled, drawn masks often feel like a bottleneck.

Looking at the architecture, it seems there is a strict divide between how masks are generated and how they are applied, which explains the sluggishness on high-res images even with powerful GPUs. I wanted to share my observations and see if my understanding is correct, or if there are any future plans regarding this.

Here is how I understand the current CPU/GPU divide:

  1. Mask Generation & Rendering (The CPU Bottleneck)
    The actual creation of drawn masks (e.g., brushes, paths) is heavily CPU-bound. The complex vector math, smoothing, and feathering calculations (handled in files like masks.c, brush.c, path.c) are done on the CPU.
    Every time a brush stroke is drawn or a node is moved, the CPU has to re-rasterize the vector path into a pixel mask. With today’s high-resolution sensors (60+ MP) or complex paths with many nodes, this brings even modern CPUs to their limits, causing the brush to lag or stutter.

  2. Mask Application & Blending (The GPU Powerhouse)
    Once the mask is rendered into a bitmap, OpenCL takes over. The application of the mask to the image is fully GPU-accelerated.

Blend Modes & Opacity: Operations like Multiply, etc., are simple pixel-by-pixel arithmetic operations. Thanks to OpenCL kernels (e.g., in src/iop/blendop.c), these run blazingly fast on the GPU.
Parametric Masks: In stark contrast to drawn masks, parametric masks (based on lightness, hue, etc.) are calculated entirely on the GPU, which is why they react smoothly and in real-time.
3. Usability Issues
Aside from performance, the UI/UX for drawn masks sometimes feels a bit clunky. Combining masks in the Mask Manager is powerful, but the UI interaction can feel a bit rigid when dealing with complex groups.

Conclusion & Questions
It can be a bit frustrating to have a high-end NVIDIA or AMD GPU sitting idle while the CPU struggles to render a brush stroke.

Is my assessment of the masks.c (CPU) vs. blendop.c (GPU) architecture accurate?
Are there any long-term plans or ideas to move the vector-to-raster conversion (mask generation and feathering) to OpenCL?
I know rewriting a vector engine for the GPU is a massive undertaking, but I’d love to hear the devs’ thoughts on this bottleneck.
Thanks for all the amazing work on darktable!

Can you post a screenshot of a mask that “feels like a bottleneck?”

1 Like

Drawn masks typically feel like the drawn area lags a bit behind the cursor on a very unsmooth way. Once drawn, the masks works normally, but the drawing process just feels clunky kind of.

That has been my impression, maybe I am wrong.

What platform are you on? I never felt that even on my 5th gen i5 with no GPU on Linux.

2 Likes

They may be using the brush tool, not paths. The docs specifically warn against that:

Note: Rendering a complex brush shape can consume a significant number of CPU cycles. Consider using the circle, ellipse or path shapes instead where possible.
(darktable user manual - drawn masks)

3 Likes

As far as I understand, the masks are only rendered at preview resolution, not raw file resolution. Does the raw file size really make a difference?

2 Likes

Here is a quick example.

yes

AMD Ryzen AI Max+ 395 - Zen5, 16 Cores, 32 Threads

I saw the entry. But I’m probably not the only one who would like to be able to use the brush, especially with a stylus.

Good question. I just tested it again with a 24MP file, and it’s very slow there too. I don’t think there’s much difference.

It works for individual brush strokes, but that changes quickly when there are more of them.

Perhaps I’m not using it as intended?

Thanks for the video demonstration. In this specific case I guess a similar effect could be achieved with two gradient masks? Not a true fix of course.

2 Likes

I’ve recently bought a pen and tablet and I’m experimenting, slowly working out how best to use it.

Have you experimented with “smoothing” under Properties in Mask Manager? I haven’t looked closely yet but it may impact performance.

Using the pen to paint on saturation changes etc, I find it on the slow side. Try with a jpeg - much faster. I’m thinking of a two-stage edit if I want to do lots of painting. First get a good basic image, output high quality jpeg, then do a painting edit step and output the finished image. But of course it’s not non-destructive editing anymore.

My tablet transmits pen angle as well as pressure. It would be nice if DT could use angle as well as pressure.

Two polygon shapes should do it easily.

1 Like

That only looks that way at first glance. Of course, this is just a quick example, but in the brush mask, many areas are weaker than they would be with gradient masks, for example.

I am in the same situation and was used to being able to use it in this way with some other programmes.

This is my main concern. Perhaps there are also alternatives to creating a raster mask directly with the brush.

Yes, I have the same experience with that on MacOS. For me drawing a brush mask is unworkable. Path masks work much better on Mac.

But also adjusting and editing the mask, I find it not really workable. There must be very good reasons why the brush is implement as vector not as raserized mask?

1 Like

What they miss is some kind of smoothing, i.e., what vector graphics programs call “mass” or “inertia”. Every little jitter of the pointer makes a new node along the path, so even if what you want is a straight line you end up with hundreds nodes.

Edit: thanks @RawConvert for teaching me (below) that this option already exists, alrhough arguably not very discoverable. I have to try it!

2 Likes

I’m trying that out today as well. Wanted to do some colour adjustment, so I exported a file in 16 bit integer TIFF and imported it into Krita. From there it’s raster program usage as normal. 16 bit PNG would probably work as well.

Perhaps because all drawn masks are implemented as vector paths? Keep in mind that the drawn masks can be re-used individually in other modules, which means each mask must be kept separate.

Also, masks must be stored with the edit instructions. Paths are much more compact to store than raster masks.

2 Likes

Smoothing - low, then medium below, high at the bottom.

2 Likes

I stand corrected! I vote a motion to make this option more discoverable and to make it possible to set it at the stroke level.

Do you find it effective?

1 Like

Ok, that helps to understand it better.

That definitely makes it easier and a bit clearer. But it’s still pretty slow.

Here is an example where I tried to create a relatively simple shape with few nodes.

Not only is moving the mask quite delayed, but exposure also becomes very slow.

Another general problem is that you can’t see the feathering, and as I understand it, there are no pressure levels either with stylus.

FIf anyone says, “No one uses styluses anyway,” then I would have to say, “No wonder ;)”

Perhaps there is an opportunity to acknowledge the “problem” and work together on a solution? It doesn’t necessarily have to be a quick fix, but perhaps it’s worth thinking about.
Maybe there is also a developer who has more information about this design decision.

1 Like

This question arose again a few days ago; the mask manager should have a ‘pressure’ option but it’s not there in your video. Pressure can control size, opacity or hardness.

It seems like the pen stroke isn’t laggy, but changing the mask location and mask/module properties afterwards is slow?

First pen stroke is fast, but it show no feathering.

OK, I’ll check if a setting is missing. Thank you.