First shot at this:
Does this concern the AI code, or the model data (which can be a large blob)?
If only the code, who will be responsable for the model data: will dt provide (links to) models or is that the responsability of the user?
(The policy mentions āAI Model Integrationā with no distinction between the code and data involved.)
Hi Pascal,
thanks for this reasonable suggestion of integration of AI into darktable. AI is something we canāt avoid but it needs to be harnessed to help users of DT. Sadly the community is divided at the moment and I hope this is resolved soon.
I think it very reasonable. I have two questions, not meant as critic but to sharpen and improve the document (In the hope to avoid future discussion).
The question are mostly about the definitions of what we consider āAIā.
-
The definition of AI tooling is something a bit unclear. What is an AI assisted tool? As an example GIMPressionist has existed long before the AI-hype, but could be seen as Generative transformation but is not machine learning or so. Would that be considered by an Unsupported Use Cases?
-
The current retouch module can be used to Remove unwanted elements from your image by cloning, healing, blurring and filling using drawn shapes. (quote from the manual), so technically it is now in the āUnsupported Use Casesā. The original Resyntesizer plugin in GIMP is another example. So someone could argue that that retouch module is not according to this document.
BTW, I think that the Guiding Principle and the end of your document is very well phrased and very clear!
That can be decided if thereās a pull request on such stuff.
panta rhei - so a policy on AI is no fixed law ā¦
This looks like a good framework. I know that some contests typically have prohibitions on generative fill and adding new pixels that didnāt exist in the original file. That might already be covered under scene modification, but I wonder if more specific language would help.
wrt adding pixels: As far as I can tell currently that is acceptable (see Super-Resolution being ok) so long as the model does not generate details which are incongruent with the imageā¦the use of the word āhallucinateā seems to do a lot of work, but is very ambiguous. Or, at least, it is acceptable in some circumstances but not others, depending on the preferences of the developers.
For those comps, you will probably need to refrain from using ai upscaling, just like you do with other ai- enabled programs.
Yes given the last paragraph of the document.
It all depends, Iām using Retouch only to remove scratch or sensor dusk which is ok. I donāt think we can remove easily main elements from a picture using Retouch.
Yeah, twigs and power lines are also easy when the background is more or less flat, but anything more than that itās really difficult(not a complaint) compared to using a raster editor like gimp
How would you rephrase this?
What Iām really talking about is using generative fill to expand the scene of the canvas, such as adding more foliage or water to the composition because the original scene was framed too tightly. The generative fill would be congruent with the scene, but that would be rejected under the rules set by Audubon or Bird Photographer of the Year.
Thatās a concern of others when considering appropriate use of AI which might make sense for DT. But if the developers feel differently then perhaps that should be made clear as well.
Good point! Iām not clear on this. It indeed adds some contents but only to complement/fix a wrongly framed image.
Clearly I have no strong opinion on this. Letās see what arguments we can have for or against.
Is there any compelling reason why darktable should do the generative fill? Other programs are able to do this, and it might fit better in their design. Otoh, it has nothing to do with raw developmentā¦
Good first draft. This clearly mentions the intent, and thatās most important.
Right now, removing a passerby in a street photo is listed both as Supported (in structure based inpainting) and Unsupported (in object removal).
I would add something about AI being optional, and entirely disabled if the user so chooses. It should also mention that models are not part of the main distribution, and can be downloaded by the user in the app.
Is the feature intended to allow for bring-your-own models, or is it restricted to a release-time selection of models?
Gotcha. I donāt know Audubon/Bird Photographer rules. Some standards (ie World Photo) do not allow Super-Resolution because it (from a functional point of view) creates spaces in between the captured pixels and manufactures entirely new pixels to fill the gaps. They also do not allow for the type of generative content you describe eitherā¦or at least that is what I remember. It has been a year or so since I poured over the fine details.
I have no idea atm. It seems like everyone in the community is still struggling to articulate what they are trying to accomplish in principled ways. Partly because there is no agreement about what goes too far/is unacceptable, partly because at the fundamental level things we donāt want to give up do not differ from the stuff that we donāt like.
My reflexive response wrt to the super-resolution case would be to ask you what specific cases you were trying to prevent with the use of the word āhallucinateā. In some sense of the word, all super-resolution tools āmakes shit upā when filling in the new pixels. If you did not have anything in mind, then no need to justify itās existence. If the current upscale capabilities differ in some meaningful way from other upscaling models, maybe just add the appropriate verbiage?
EDIT: Btw, thanks for putting forth the time to create a written policy on all this. Since so many people have no idea what dt is (much less its capabilities) and the fact that many people have no ability to look at the models/code and determine what modules might be counter to their rules, having a policy doc I can send them is helpful!
Fixed, an editing issue. I have removed this from ok part.
Although there may be things beyond what was mentioned, the guiding principle gives enough context for us to easily categorize almost everything.
I totally agree that we should appreciate the ability to remove minor distractions or flaws that are anyways removable (and a lot of people do it on a daily basis) quickly in contrast to the boring and time consuming methods. Also the inability to easily change the narrative of the image is emphasized enough.
I might add a small list of things that may require some thinking in terms of ādo we want to see it implemented if someone wants to work on itā or āno we are good without itā -
I saw Darktable AI Rating & Tagging - Lua-Script and embed-openclip-vitb32.dtmodel
is already there which I appreciate a lot.
I am curious about your thoughts on these-
-
Facial recognition for sorting or tagging even (with names?)
I know a few people who do a ton of sports photography or work full time as a photographer for a team and similar cases. Although they will benefit greatly from this apparently, this is not limited to just them and I do see others finding it a very useful tool. -
Another similar idea for culling is finding the sharpest photo in the burst or a selection of images.
I know someone would need to either make a model or find one that checks all the boxes for these but I would like to know how the devs and community receive this idea.
Things like GitHub - sharif-apu/BJDD_CVPR21: This is the official implementation of Beyond Joint Demosaicking and Denoising from CVPRW21. Ā· GitHub and GitHub - zhaoyuzhi/QRNet: Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring. IEEE TIP, 2024 Ā· GitHub have existed for a while but we would need open source licensed models for any hope for integration in darktable. Quad Bayer demosaicing has been something I have hoped for a while but I donāt think thereās enough demand for it or interest so I will leave it at that .
I hope I can still add some cats to my photos, guess Iāll just have to wait till a cat choses me.
The use cases demonstrating whatās in and out of scope are clear and easy to understand. This is very helpful. Bravo.
I agree - auto-tagging would be very helpful as long as it doesnāt hallucinate cats.