Should darktable remain AI-free?

Tools are never just tools. They always affect us in ways that are beyond our rational control. But besides that (since I assume most people don’t even care to have that discussion), the tool that @sillyxone pointed out for Kdenlive is a plugin that seems to be rather large, 148MB in the screenshot for just the model. And the plugin shows 5.3GB. But even if it’s just 200-300MB, I would hope that downloading such models would always be optional so that those who are downloading Darktable and don’t care for AI won’t have to waste their connection on storing models on their computer.

Because right now, the darktable download is about 80MB. Very nice for us on limited data connections, and I certainly wouldn’t like to waste that data on AI.

2 Likes

If the AI tool uses an AI that lives on the Web/in the Cloud and darktable has to “phone home” to use it, I’d say no. “Never trust anything that can think for itself if you can’t see where it keeps its brain.”

4 Likes

I think that a lot of the responses are attacking a straw man. To orient the discussion a bit, would many people have objections to an ML based module that

  1. Is self-contained: no network connection required, everything is local, implementation is FOSS,
  2. Has small data requirements (comparable to, say, 10 raw files),
  3. Has an open training setup one can run if desired?
1 Like

I agree, I would also split it into each implementation such as:

  • Vision: For masks. Often public models with public training data. Computer vision is old and thus has many open data sets.
  • Generative for de-noising/sharpening/upscaling purposes: Same as vision. There exist open models with open training data.
  • Generative: Object removal or generation. Often use public but ultimately black box models for which we have no idea what was used to train them

I believe most folks oppose vehemently the 3rd option, but would be open for the other two.

3 Likes

What’s the basis for that belief?

Seems like each time the proposal is missing the point.

Let me rephrase/clarify, darktable is non destructive, the pipe is live and running continuously to display the edit. So the main point is that the ML based module must be running in less that 200ms. I bet no model can be run this fast currently! And until this is true there won’t be any ML based module.

2 Likes

I’m pretty close to that already in some experiments with denoising on raw Bayer images. I haven’t made a proper effort to optimize yet - there’s probably an order of magnitude or two of speedup that can be achieved, but obv won’t know if possible and what the tradeoffs are till I get a chance to try

2 Likes

AI tools could be integrated into darktable in my mind but only for making the tools that already exist in traditional editing (or DAM management) easier. We already have external raster masks for example so an external tool that could help a difficult selection easier would be a good idea. Ideally, the AI mask would be vector based and integrated into the xmp file would be better as pfm files are large and need to be managed. Maybe an external (FOSS) tool could do that for us :send the image to the tool and then save the selection to be used by dt ?
For anything like generative content, darktable isn’t a pixel editor and I don’t think it should become one.
I sometimes use an old version of dxo photo lab to denoise before editing and I don’t mind that. I believe that AI denoising will become widely available one day and may be added to dt in a future pipeline.
I don’t like the idea of AI tagging as I like to keep my list of tags quite small. I use tags to create virtual collections like smart folders in LR.
Just my two cents as they say.

There are both generative and non-generative approaches to each of these three categories. Or more broadly speaking, “image restoration.” Notably, it seems like people have tried to use diffusion models (like stable diffusion) for pretty much everything. My point here is let’s be careful about throwing terminology around. Its a complex space with a lot of bad ideas, yes, but some of the good ideas are very good ideas.

I’ll remind folks that some of this stuff may be baked into your camera’s ISP in the near future. See Canon’s R5 mk. II in-camera upscaling.

3 Likes

If you drive up iterations, on some machines you diffuse & sharpen can easily go above 200ms.

Also, if you are working on a later part of the pipeline, the earlier parts can be cached: eg if you are tweaking color balance rgb, the results of denoising (which is a prominent use case for ML) can be taken as given and don’t need to be recalculated every time the user moves a slider.

I get the impression that you have already decided to be against ML and are just making up reasons: first the size, then the speed, etc. This is unfortunate, because prominent DT devs being so much against ML discourages development using these algorithms: one could be in a situation of having invested a lot of time and effort in making a PR, only to find that the goalposts have shifted, because no ML.

I understand the resentment towards technology currently hyped as “AI”, but ML is a much older field and ML-based algorithms for images have been researched and incorporated into both software and cameras for decades now. They don’t have anything to do with LLMs or chatbots.

1 Like

Sometimes my OpenCL dies for one reason or other and the 3 D&S instances I usually run with take more than 5-10 seconds on a 5950X in CPU only mode :smiley:

Thankfully now with capture sharpening this is no longer required

1 Like

As others have mentioned, I would like to see it implemented for masks, content-aware fill, and perhaps as an aid for organizing photos, automating tags, categories, and that part that isn’t so much about editing and, at least for me, is a bit tiresome.

3 Likes

I’m really fed-up about this discussion. If you know better please open a PR which integrate properly an AI module into dt pipeline. If not please use another editor if you’re not pleased with the decision made for dt. Last but not least, I’m against nothing I have even mentored a student in a Switzerland college working on an AI module for darktable. So keep your rhetoric for yourself and read carefully the messages posted here before drawing conclusions.

3 Likes

To emphasize this, none of the current dt devs is againt anything. We just all try to make sure when implementing anything new - and we do at a pretty good pace - that (1) dt is usable on “normal” systems with a decent response, (2) edits can be re-used later without bad surprises and (3) the new algo is stable and even improves results. For these two rules we a pretty strict policy.
Lots of people doing playraws here do a lot of very specific masks. What would you hear if those masks just fucking fail because any AI gives another result the next time the image is developed in darkroom? Will you all shout “hurray - what a pleasant surprise”?

If someone really wants to jump on the dev wagon, one algorithm we are missing is: convert a width*height full image bitmap /generated by something clever like an AI) to a standard darktable mask. That would very likely be a requisite for any AI segmentation stuff, either a depth or selection map.

BTW - i can’t count times things for dt had to be implemented from-the-roots and we all profit from such work. And - yes it may very well be that something developed won’t appear in dt master. That’s the way the cookie crumbles. But - i haven’t seen anything good and ready to go not appearing.

Everything can be discussed / reiterated here for sure but if someone wants something to be implemented and none of the currents devs is working on it, just GO for it yourself and i am absolutely sure we devs will assist!

10 Likes

I’m working on a ML-based color correction tool for darktable. Its use case is a bit niche as it is intended for correcting analog material with strong color shifts (bleached dyes) rather than digital photos. I will post details separately, when it is ready for testing, just want to share some aspects of my implementation here, as it avoids many of the concerns raised above.

  1. Pixel pipeline. It is not always necessary to add a new processing module, DT is already very rich in features and data structures (rasterfile, splines) that can be leveraged. My tool uses an instance of rgbcurve, with spline points predicted by the ML model. The model is not part of the pipeline and is called only once, when the user hits a button.

  2. XMP sidecars. The history stack combined with xmp files that store the entire editing state in a compact file is an amazingly powerful interface that allows external programs to communicate with darktable and its non-destructive pipeline. My tool simply adds an item to the history stack, and users already know how to do before/after comparison etc.

  3. Lua scripts. Great testbed for new features and optional functionality. Recently, the Lua API has received functions for updating the database from XMP files. That allows my tool to be invoked in the lighttable on selected images from a neat widget. Shoutout to everyone involved!

  4. Model dependency. 10 years from now, if darktable keeps its backwards-compatibility promise, rgbcurve would still be available. No need to have that model still around that predicted the spline points. The XMP will still work and images can be exported with identical output. Users can even edit the spline points manually.

  5. Computational efficiency. There are many many orders of magnitude difference between a small (<100 MB) specialized ML model and foundational “AI” models like nano-banana, which are cloud-hosted. I hope, users will get a better feel for this over time, but AI labs need to make this more transparent. My model btw needs about a second per image on a single CPU thread (48-bit/8 MP scan).

My two cents from a user perspective: I’m looking forward to smarter noise reduction tools and a mask-based dirt removal would be awesome. I’m sceptical about AI-based upscaling and sharpening, right now, these are expensive (foundational models) and I find SOTA results very disappointing (inconsistent, “zombie faces”, lack of control…). Diffusion-based generative AI is developing fast on new software platforms with new UX concepts, I don’t see darktable being part of this.

3 Likes

I think there are always shades of gray and what @PassionForSeeing is describing sounds very minimal. But I’d just like to point out that AI-based editing modules have some properties that other current DT modules simply don’t have:

  1. Any module based on an AI model will tend to grow over time. Even if you find a small one that is 50MB, which is still large in my opinion, it won’t be perfect and people will try to improve it as computational power becomes even cheaper.

Yes, one could reply and say that “users could simply ignore AI tools”. But what happens to the DT experience when new users with older computers click on multiple buttons here and there and the program becomes too slow just because the AI features are everywhere?

  1. Point #1 is made more relevant by the fact that once people start to use AI, then become more mechanical in their editing and they develop a taste for AI-based editing. They crave more and demand more AI-based tools, thereby intensifying work to improve them and hence make them more complex.

  2. Unlike other modules, AI-based editing tends to encourage photography as a process of production rather than a process of artistic expression. Of course, right now, this seems irrelevant because one or two AI tools will hardly make a difference. But as more are developed, it will only reinforce this mechanical nature of photography.

Over time, if AI tools become numerous, then they reinforce the philosophy of mechanical production rather than a more personal touch in photography.

Nietzsche said that, “we must not purchase the alleviation of work at too high a price”, and I think that we should consider the price of using AI in this case.

Technology always starts this way: one or two seemingly optional tools. But then such tools create obligations (the obligation to maintain it) and dependencies (the obligation to make sure it works and is reasonably available), and expectations.

And then they slowly transform behaviour as well. The more AI tools available, the more people will focus on editing a larger volume of images and they will be more accepting of flaws in their technique. They will cease to learn how to mask themselves, and cease to learn tools when automatic ones exist. That in turn will transform the community: currently, people come here to pixls.us to ask questions and learn from each other. The more automatic the program, the less questions will be asked and the less will be learned. But learning itself is a core of artistic expression because one needs to solve problems for themselves that are not of a completely mindless, rote nature. Once AI starts solving such problems, we move one more step away from our independence.

2 Likes

Apologies for initiating this controversial discussion. However, it seems to have struck a chord in these heated times of AI.
After all, the discussion is only theoretical, as there is no functional AI code available (yet) that could be incorporated into the main branch.
I have identified three fundamental trends in the posts. Please correct me if I am wrong or have forgotten something.

  1. Forum members who fundamentally reject AI tools in darktable and justify this rejection.
  2. Forum members who accept AI tools, either completely or partially, only for certain functions. I think I have read that the focus here is on the result of the image, true to the motto: the end justifies the means.
  3. Forum members who define the term AI as an umbrella term and divide it into several individual parts, such as ML, generative AI, etc. Some forum members tend to accept all of the above, while others only accept ML-based tools.

There is consensus that all tools, whether with or without AI, will remain reproducible with the same result, both today and in a few years’ time.
For my part, I prefer a solution such as the one already presented by darktable, which remains open source in all cases, does not create dependencies on central servers and makes the result traceable. Users should remain in control of their images and not have to put them into a magic box in favour of a good image.

5 Likes

This showed up in my feed.

He made a lua script to interface with ollama using a few models to do some organizational tasks: tag selection, rate selection, pick the best in a selection. It seems to work OK though maybe a little slow. In the comments he mentions one of the models requires 20GB in RAM. It seems like he made this just to see what’s possible and how it might work.

1 Like

Tagging using a LLM locally is a pretty good use of them. I have run some experiments to run some initial splits in a few mobile camera dumps I have. In the prompt I ask it to pick the most appropriate tag out of a bunch, and then I place it in a directory. Later I could go through those piles individually and pick which pics I wanted to keep. It’s nicer than other auto taggers because you can impair some logic in it using your prompt. With Qwen 7B or Gemma3 12B it still takes 1-3 secs per image so not that fast.

That said, his other uses confuse me a bit. Why would you ever want AI to rate and select and your best images? That’s throwing away part of your artistic process and IMO defeats the purpose altogether.

because people are worried about size/weight/bloat (me too), i have to rant about some recent tests i did:

  • just installing pytorch-cuda-something (that you’d likely need for inferencing too) cost me 18G additional packages on my system, before you even start.
  • i really want u-net denoising because the results are good, so i downloaded benoit’s great raw natural images dataset. besides taking all day and still didn’t finish preprocessing, this cost me 0.6TB (!), but at least it seems practical to do the full training on a desktop machine.
  • if you want to ship this to the end user, the network weights will be some kilobytes, as pointed out by someone else above (that’s fine)
  • but: the runtime would be ncnn, onnx, or nvidia something, or straight pytorch. in most cases a bit of an abomination to wire (though ncnn could be near seamless in vulkan), but definitely gigabytes of extra stuff (compared to a handfull of megabytes for the actual application code).
  • it’s possible to write just the lightweight inferencing code yourself, but tedious. changes in network arch always take me ages to update in the code.

really not happy with all of this, but especially the noisy 10-bit raw video from telephones would really profit from u-net denoising, so i hope this will happen at some point.

2 Likes