Darktable AI Edge Detection

That is what I was getting at. For someone who knows what to do, manual selection is much faster and more accurate than any AI. Using AI still requires fine tuning, editing, layering and blending.

The benefit of AI is getting a second opinion. In the forums, #processing:playraw and #critique are excellent places to get that. :wink:

IMO darktable should also be for quick and good enough results, as some sorts of photography require this approach. E.g., some of my sports or event photography, where I only can spend less than a minute per picture. Why should it only suit one type of photography?

Maybe I donā€™t know what to do, but hereā€™s a comparison (not darktable in particular, also gimp and rembg as AI tool, for a quick assessment):

Hereā€™s a comparison from different shootings where I had to separate soccer players (children, no full picture because these images are not public) from a background which should be uniform but the light conditions were awful and the background not uniform at all in both shootings, one 2022 and one 2019. The first image shows what I was able to get with gimp foreground selection, Levin matting in about 10 minutes:

Here is what rembg gives me in 3 seconds:

Is the AI approach better than what I could get with gimp? Not much, but in my case it was at least good enough, while the gimp approach failed for my application. Here is another crop from the gimp:


and the same from rembg:

As I did not try in darktable this time, hereā€™s what I was able to get in darktable in 2019:


While the result is OK, it is in this case not a real mask, I was fiddling with curves after base curve in combination with masks (this was my pre-filmic time) to get the hair separated from the background and the background white, but I was not able to entirely remove the yellow in between the hair. I spent hours for each picture back then.

Of course, AI is not magic, and if no dev is interested in starting this direction, this is perfectly OK. But one should not denigrate these methods from the beginning.

3 Likes

How much masking do you, even if you used an editor more geared for speed, in this case?

I am not arguing that AI is not superior, but that blind use is often slower. In your example, the matting technique from GIMP is much older compared to the model the AI received training on, which contributes to the lower quality result.

PS - dt makes extensive use of guided filtering. A combination of novel GF and matting would give masking a headway.

I generate various types of starting-point masks, e.g., from channels, norms, Gā€™MIC processing, etc. Then, as you suggested, modify them using curves, etc., to get the desired object borders.

I guess it comes easy because I have coding experience and I regularly read about algorithms, so I know what to look for and how to approach problems.

2 Likes

Of course, in the extreme cases, none. This was more referring to the general statement, not the masking in particular. However,

  1. thereā€™s some grey in between black and white, and masking a subject for separating it better from the background if it is only 1 click and 3 seconds away would be used, even if I donā€™t do it yet.
  2. If there is a faster method available to achieve the desired results, it makes sense to use this method and save the time. In 2019, I was spending more than 20 hours for the player portraits, 1 portrait per child, in 2022 with rembg it was only one and a half evening, maybe 5 hours, and I got 2 portraits (different poses) in better quality despite shot in worse light, plus a soccer trading card of each player, done in this time frame.
  3. With darktable, I am already able to get even excellent results very quickly in many cases, but this is not the question. It is about possible directions of further improvement.

Plus, I am also concerned regarding the mask quality when e.g. separating hair. However, this is more about gimp vs rembg and not so much darktable, as I would typically not do the background removal mask in darktable these days. However, if such a feature would be available, I would use it.

But speaking more generally, I am not too much concerned about the darktable masking itself, it is already excellent, even if I have some ideas (e.g. loading external masks as raster masks with a dedicated module). It is more about the ā€œfundamental hostile stanceā€ (I hope it is the correct phrase, I am not a native speaker, ā€œgrundsƤtzlich ablehnende Haltungā€ in german) that I observe in this thread. If the method is good, but too much effort to implement, it is still a good method. And IMO the method is good, judging from my experiments with rembg (a F/L/OSS AI tool for subject detection and background removal).

3 Likes

OK, I understood it different from your post and wanted to demonstrate my findings therefore, sorry if I misunderstood. However, in the case of my example, it was blind use for background removal, I batch processed about 50 images, and the AI failed only for 1 image where a background island between arm and body was left, but this was cropped away anyway IIRC.

Of course, the task was relatively simple (for the AI, not for me as I demonstrated), isolate the subject from a busy (but uniformly busy) background. For other tasks, it fails miserably, but one can always use the traditional techniques in these cases. It e.g. did not work for the group photograph, where I tried it just for fun and it cut off arms and legs.

The thing is, isolating a single subject from a background (or removing the background) is something that is extremely important for many use cases for photographs. It is a common task, which can be easily seen by looking at the local supermarketā€™s ads (or baseball cards to stick with my use case). Does this particular, common, task justify its own AI masking tool in darktable? I donā€™t know. I can only say that I would use it.

Edit: Btw, rembg uses GitHub - pymatting/pymatting: A Python library for alpha matting for the matting, which may make the difference. It implements several matting techniques, not sure if they count as ā€œAIā€.

Where is that hostility youā€™re referring to? When I re-read this thread, it seems like it starts here:

And it more or less ends there.

rembg seems very specialized, and yes from what youā€™ve shown looks to do a nice job. I donā€™t know that it has a place in darktable, except for integrated with some lua scripts so that it is run at export time. Otherwise, it would seem weird to have inside darktable.

Otherwise some more general AI masking is:

  • time and labor intensive
  • needs trained models
  • seems heavily dependent on lots of python, which doesnā€™t integrate nicely with darktableā€™s C core
  • is changing rapidly, which ties into point #1 above

You seem to be mixing two slightly different things , rembg (which looks good) and generalized AI masking.

I am not a native English speaker, so maybe I am wrong on that. It felt like. And yes, on both ā€œsidesā€ to me. But hostile is maybe a bit too much, I thought, from the dictionary, ā€œhostile stanceā€ would be the correct phrase, but maybe it is not.

Please see my answer to @afre, the comment about pymatting. However, I talked about rembg because I was able to do a real comparison due to my experience with it, which at least shows that AI results are not bad and can be much better when comparing with ā€œmanualā€ masking in the same time interval. Of course it does not fit 100% to the OPs question, but at least it is related.

I thought for a trained model, the handling of the model itself should be not too complicated to do in a language other than python. But I am not an expert, and this may be wrong.

Regarding the other 3 points: There are more and more models becoming available as free/libre software. Yes, some legal questions are not yet figured out, but at least thereā€™s something going on. Other than that, you are correct, his needs somebody to do the implementation, and as long as nobody wants to do it, itā€™s just daydreams. However, I personally find a lot of inspiration in daydreaming :smile:.

3 Likes

I didnā€™t know where to put this comment, so I didnā€™t talk about it before.

My thinking goes: why not create the mask outside of dt and then import it. I am tired from work: can we import masks into dt? Takes time, but could be a stop-gap solution.

Not currently. There is raster mask facilities and you can use masks from other modules, so youā€™re not too far off from loading an external mask, I suppose.

People coming from commercial software value the export-to-other-apps-to-edit feature (or simply sharing rasters or vectors). If they could import and export stuff mid-workflow, that would be attractive to them. However, that requires knowledge, metadata, proper insertion and may break things or cause confusion if done willy-nilly.

I think the issue is if someone that is developing DT has the interest the time and the skill to implement this sort of thing I suspect it would happen. But we canā€™t commission people to include features we want in a FOSS product. We could of course make requests or suggestions but in the end this is why commercial software exists. We pay people to create or do things we cannot or do not have the time for. Its not the same with FOSS projects. Also many things that are suggested for DT ignore the way DT processes an image and the pixel pipeline and as such depending on the request its not often a simple matter of just adding something and having it work and or not break somethingā€¦

So I donā€™t always think its necessarily a resistance to the concept or the request but rather an insistence that DT would be better if only x y or z was addedā€¦ Its more about somehow " the insinuation that somebody should really get on this or the current thing is inadequate " etc etc ā€¦

The response most would likely give back is to say the code is all there have at itā€¦

I think I have heard this before and it seems quite silly to me and I suppose its hyperbole. The camera makes a jpg for this reason because for sure to make anything that is much better from the raw file is usually going to take more than a min if for nothing more that thinking about what you might want to do with the imageā€¦

There is nothing stopping you from doing that except your own pocket book.

I was being generous and inaccurateā€¦ and even then the offer of money would still not be important to some who might work on a project for a variety of reasons and not want to be held to a deadline or outcomeā€¦but you are rightā€¦ I could have said request although often its not requested even but more first a description of what is lacking and then what should be done or this is how product x does it why doesnā€™t DT ā€¦ all flavors are thereā€¦ luckily usually more than compensated for by grateful people very happy to benefit from the collective efforts of the teamā€¦ myself includedā€¦

For edge detection, algorithms can do the job. AI is necessary to learn from User what is relevant. I expect Lightroom use the online data RAW ā€”> JPG, the saved dev. steps from all of the customers to get a mega data base for training. Maybe they consider by (manual) selection / sponsorship only pro photographer.

I thought the following: Would it be feasible to implement a ā€œlua iop moduleā€ that executes a lua script at its position in the pipe? The lua script would be called with the image and mask data at its position in the pipe, and would return image data as well. Optionally, it would be able to emit a raster mask as well. The module would not show up by default and only be available if some option in the preferences is set, to prevent its use by regular users. The purpose would be prototyping of iops rather than implementing something for production use, but could also be used for use cases that are needed by a very low number of users. In particular, exporting the current pipe state as image and reimporting an externally processed file could be easy that way, even at different positions in the pipe with two of these modules and respective scripts. An option to define if the module should be neglected in thumbnail rendering would probably be needed.

I know that the roi concept is something to deal with where I lack the knowledge which options are possible, maybe recent work on parallel pipes may ease this topic, but thatā€™s for a later discussion anyway.

E.g., for me, it is feasible to write a bit of lua code if time permits (the last 9 years, at least one of my children was not autonomous enough to give me much time to work on such things, but these days I regain some free time), but my c knowledge is basically nonexistent, which makes it hard to prototype something (I tried but did not succeed). Such a module could help people to try out things with a much more gentle learning curve, and even the solutions that donā€™t make it into c code eventually may be helpful for some people.

What do you think, is this worth starting a feature request, or is it even possible already?

For me itā€™s the exact opposite. If I pay for a software such as lightroom, I still have almost no chance to get a feature request implemented unless thousands of people are crying for it. Thereā€™s no direct interaction with the devs. With f/l/oss I always have two options, i can implement it myself (or directly pay somebody to do it), or I can start a feature request. In that regard, darktable devs have been very generous, many of my feature requests made it into actual features, many of them back in the redmine era. Thatā€™s in particular one reason I am using f/l/oss software: The possibility that I can contribute one way or the other not only money but also ideas, and that there is a chance that they become reality. Lightroom would have never got a vertical waveform just because I had a need for it, just to make one example.

Thatā€™s IMO not true. First, when I shoot raw only, there is no jpeg. But even if it is there, the editing capabilities of the camera are very limited. When I shoot a soccer game, the light conditions do not change dramatically, and I can come up with a reasonable color edit in a couple of minutes (I even have a preset for the soccer photos as a starting point). This is then copied to all images of the match that I want to edit (culling comes before). Typically more than 50 pictures, as every player should be at least on a couple of photos. The edit per picture is then only cropping (sometimes very heavy cropping as 200 mm is my maximum focal length) and straightening (this in particular is not possible with the jpeg without massive quality loss), and maybe masking out some distraction in the background to make it a bit darker (e.g. sun reflection, but typically a very low number of images per session require masking, but sometimes I use it also for vignetting), plus some exposure, contrast and vibrance correction due to changing light or shooting direction (sun position relative to the shooting direction). All of this does not require much time, and I am thankful to have such a great tool for it :smile:. Using camera jpeg would be a no-go for me. I am doing this in my spare time to give something back to the team, as my son is playing in this team, and so far people like it. But typically I want to have the images out on Sunday if the match was on Saturday, so I cannot spend hours on a single photograph, getting one hour to edit the whole set of pictures is already luxury.

As I said, many of my feature requests made it into code and ā€¦

I am very grateful for this and I hope that this is clear. However, from your posts I get the impression that feature requests are in general a bad thing. I donā€™t think that this is what you intend, but that is what I read on my side. I think I understand which attitude you are arguing against, but we are not all native speakers, and sometimes things may sound more harsh than they are intended, also in feature requests.

Question to the experts: here, it is stated that a good mask is not good enough, but that a foreground detection is mandatory for reasonable matting result. How does this relate to the current gimp implementation of foreground selection?

I already looked once or twice inside the darktable code, and I wouldnā€™t say that itā€™s technically impossible to do.
But this comes with a lot of constraints and questions. If some peoples already find the current processing pipeline a bit slow, what would it become with a lua script in the middle ?
Letā€™s imagine a simple pipeline like: exposure ā†’ Lua iop ā†’ filmic
What would happen when you change the exposure step by step (like roll your mouse wheel). Do you expect the lua script to be called for each step ? if itā€™s the case, it could take a lot of time to process.
Another solution could be to only invoke the lua script on demand, once and have it cache the result. But that means you can ignore everything that comes before the module.

For the mask problem, itā€™s probably more simple to add a way to edit an existing mask from the mask manager. Because in that case, there is no update loop involved.

1 Like

Sorry I didnā€™t mean that at all I meant that you suck it up and pay for a commercial product that already does what you want say AI masks or denoising or what everā€¦ for sure those projects/products are not nearly as responsive to feature requests and input unless it impacts the bottom lineā€¦

Not at all and my apology if it was conveyed as suchā€¦ I think if it was it was in response to the use of the word shouldā€¦ and as you say that can be a language thing. I donā€™t think we have the right to tell developers of a project what they ā€œshouldā€ do but rather could this be done or is it reasonably possible with the resources at handā€¦ and you are correct again there could be what resembles push back at times but perhaps not quite at the level of hostile ā€¦

1 Like

I think that would be reasonable, those who are using this feature will know this limitation. Therefore I would hide it behind an option in the settings, such that one cannot stumble upon it by accident.

1 Like