A Port of Darktable to the GPU. A Dev Diary

bastibe · February 2, 2025, 12:29pm

I couldn’t help myself. I just had to try. I had to try and build my own raw developer.

Well, “my own” is perhaps over-stating it a little. It’s mostly Darktable, but pared down to the essentials, and running entirely on the GPU.

Let’s rewind a little. I love Darktable! But… I am a software developer by trade, a signal processing scientist by training, and image processing engineer at work. I have opinions about stuff, and especially about software stuff and image processing stuff. So I tried Capture One instead, for a good year. It looks nicer, it works well, but its tools are terribly restrictive compared to Darktable. So I tried Lightroom, for half a year. It looks worse, that AI stuff is really useful, but it also didn’t trickle me like Darktable does. Time and time again, I came back to Darktable. I even contributed some minor changes and addons to better fit my vision. But all the while, this nagging feeling, that it could be better, if only…

I watched with horror and fascination as Aurelien did his fork. I follow Hanatos’ vkdt with interest. I’m amazed by Glenn Butcher’s rawproc. And I knew, some day, I’d have to try and build my own.

So the other day I had some time on my hand, and decided to give it a go. I’d start with technologies I know, and see how far I’d get. Something simple to load images. The Python package rawpy did the trick. A bit of UI around it, with PySide and QML. But the meat of the program was always going to be running in a GPU shader. So I built a little bridge to pass the python image data into a QML shader. And amazingly, I got this prototype to display a raw file within an hour or two.

A slight hiccup was the file’s bit depth. Cool as Qt/QML is, it is mainly built for sRGB surfaces. It took me a short while to figure out that I could use a QtQuick3D.QQuick3DTextureData provider to pass a 16 bit texture to a shader. Another Challenge was getting the GPU image back to the CPU for saving, as QML is built for displaying things, not processing. All things considered, QML was perhaps not the best choice overall. But at least it makes GUI prototyping easy.

With these basics out of the way, I set about porting a few of Darktable’s algorithms to my shader. Sigmoid was first, to get viewable colors from the raw data. The algorithm itself was quickly ported into my shader, with copious help from an LLM, somewhat to my shame. It was just very good at translating C/CL code to GLSL. Then I needed to hook up some sliders to the shader parameters, and presto, a prototype of a raw developer.

Second came Color Calibration. This required a color picker, which was another challenge, as there was not yet a way of querying a GPU surface for pixel colors. So I essentially needed to render out a thumbnail screenshot whenever pixel colors were required. The same process is used for saving a rendered image as well.

I built my processing pipeline at the photo’s native resolution. GPUs can run entire video games, so a couple hundred megabytes of flat pixels hopefully won’t tax it too much. So far, this seems to work perfectly fine, with the full image processing pipeline running easily at 60 FPS. That’s merely a 25 MP image on a fairly beefy GPU, though. I recon I’ll have to do some performance profiling some time in the future. Funny thing is, this is currently all interpreted code; no compilation necessary to run any of this. I think I needed that as a contrast to the endless compile cycles at work.

Initially, I was going to use the aforementioned screenshot thumbnail for calculating the waveform as well. But why involve slow Python code when you could run on the GPU? And indeed that worked well, so now I had a smooth, real-time waveform display.

Then came the tone equalizer. I was a bit worried about this one, as it requires blurring of the image to build its mask, which could be slow on the GPU. But that turned out to be easy to do, and the GPU didn’t even break a sweat. It was actually very interesting to get into the guts of this module and see how it works.

Next came color balance. To my surprise, this was a bit of a headache. That’s actually understating it; it took me a good few evenings, and eventually, single-stepping through Darktable’s source code with a debugger, side by side with my own code, complicated by the lack of a debugger on the GPU. But eventually, it, too, was ported to my shader.

As you can see from the screenshot, this actually makes for a competent little raw editor. It does not yet have a name. I don’t yet know if it has a future. It still lacks myriad critical features, such as file system browsing, saving/loading editing parameters, export settings, just to name a few. To say nothing of denoising, sharpening, masking, the color equalizer.

But it’s been fun, and highly educational. It was really cool to step into darktable’s image processing code and actually see how stuff works! I already learned a whole lot from that, and hopefully will continue to do so. While my code is clearly Open Source, I haven’t yet made the repository public, as it’s still way too messy to share.

paperdigits · February 2, 2025, 1:18pm

You’ve made the jump, congratulations!

ggbutcher · February 2, 2025, 4:02pm

Very cool! On travel right now, will take a better look when I get home.

rawproc amazes me in the manner of how that mess of code actually works at all…

hanatos · February 2, 2025, 5:21pm

welcome to the glamorous world of glsl then! i guess 60 fps is because vsync and actually you’re running much faster.

bastibe · February 2, 2025, 8:58pm

One productive evening later, the tone equalizer is now vertical, we have metadata reading (as evidenced by the new footer line), and editing parameters are now saved and restored.

Just a few more tweaks, and I’ll actually be able to use this for a few edits.

I’m amazed that I got this far in three weeks of on-and-off hacking in the evenings after the kids go to bed.

bastibe · February 2, 2025, 9:06pm

Your vkdt source code was a huge inspiration! And a marvel of well-organized modular design as well. Thank you so much for building and maintaining it!

And the Darktable code base itself is worthy of praise as well. Had it not been well-organized and cleanly written, none of this would have been possible.

The 60 FPS is probably just vsync, yes. But macOS reports 90% GPU usage when I wiggle the sliders, so I’m probably being terribly wasteful .

g-man · February 2, 2025, 9:27pm

How fast is it to process/export an image?

bastibe · February 2, 2025, 9:41pm

As long as it takes to write a JPEG file to disk. A few milliseconds. Since all processing is done at full resolution anyway, there is no separate export process.

paperdigits · February 3, 2025, 1:15am

I really (selfishly) wish someone would port these modules to vkdt (tone eq, color balance rgb, filmic, sigmoid).

bastibe · February 3, 2025, 5:35am

Once I clean up the repo, the code will be all there. It’s just one big old fragment shader that should translate easily to vkdt’s module structure.

paperdigits · February 3, 2025, 6:56am

What’s its written in?

bastibe · February 3, 2025, 7:08am

GLSL, just like vkdt’s shaders. It’s a straight translation from Darktable’s code.

hanatos · February 3, 2025, 8:08am

looking forward to exchanging some code then vkdt is predominantly compute shaders, and using vulkan dialect so i can access subgroup/cooperative matrix features and such things. copy pasting your fragment → vkdt should be very easy, the other way around mostly so, too.

tone eq vkdt: eq: local contrast equaliser

and the vkdt: filmcurv: compress scene referred linear for display is pretty much sigmoid, only using the weibull distribution. it even uses darktable UCS for colour preservation, if you want to.

i don’t remember what color blance rgb does, but the docs here vkdt: how to find things if you know darktable say that i could get a similar effect with some mask and blend modules.

bastibe · February 4, 2025, 3:22pm

Today I tried running the program on my Surface tablet. It ran abysmally slowly. But I quickly figured out that this was caused mostly by the big Gaussian blur in the tone equalizer. If I reduced the tap area from 9x9 to 5x5 (on a 4x downscaled mipmap!), it actually ran just fine.

The 12" screen of course came with its own challenges, so now the tools panel on the right scrolls. And I cleaned up the tone equalizer layout a bit. In particular, I got rid of the chroma and brilliance sliders. Useful as they are, they are fairly redundant. I generally prefer the look of saturation over chroma, and brilliance is already pretty well-served by the tone equalizer and offset/gamma/gain lightness.

I’m not completely sure if it’ll stay that way.

Oh, and the image background is now white instead of grey. I noticed that I tend to introduce too strong a color cast on a non-white background, so I changed it. (In case you’re wondering about the black border around the image, that’s actually an error in my 3D setup. It does look kind of cool, though.)

bastibe · February 6, 2025, 7:25am

Working on the distortion correction at the moment. That’s a tricky bit of code.

And I think that I don’t have it quite correct yet for Fuji X-Trans V. Somethings seems a bit off.

Tamas_Papp · February 6, 2025, 9:51am

Since integrated GPUs are reasonably powerful these days, and AFAIK they share RAM with the CPU, I am curious whether they would be a way of overcoming the memory limitation (at significant drop in pure performance, though it may not be the limiting factor). I have dabbled with Intel’s OneAPI a bit but I do not know enough to recode the kernels.

ggbutcher · February 6, 2025, 10:31am

The algorithms in the DNG spec might be a good start. I’ve encoded the full distortion algo in rawproc, see:

https://github.com/butcherg/rawproc/blob/master/src/gimage.cpp#L5782

Note that Lensfun and Adobe use different normalized coordinate spaces, so their algorithms don’t easily align.

bastibe · February 6, 2025, 11:05am

Funny thing, both my computers have “integrated” GPUs (a Snapdragon Surface, and a M2 Mac). At some point I’ll have to see how it runs on a PC with separate CPU and GPU memory.

bastibe · February 6, 2025, 2:24pm

Thank you! I’ll look forward to looking into your implementation when the time comes!

bastibe · February 10, 2025, 9:44pm

Another quick development update: the program now supports drawn dodge and burn masks. I always found it strangely soothing to paint out lights and shadows in a pixel editor. Despite this, the functionality seems to be missing in most raw developers (but can of course be emulated with masked exposure adjustments).

So now I have implemented in my raw developer (preliminary name: “be-raw”). I do suspect I will find out soon enough why other raw developers don’t appear to implement it this way. But that’s the beauty of having such a playground for experiments.