NN and depthmap developments

Whatnot · April 13, 2025, 6:46am

I recently found out that ‘we’ (very clever Chinese guys) can now do very good depthmaps from 2D images using neural networks.
And not just photos but even drawings.
And it does good even with a small version dataset.
Link:
depth-anything-v2 on Github

So I wonder if anybody can do something with that in GMIC.
And apart from me hoping the depthmap creation part would be added I’m also thinking of filters to do more with those depthmaps.

There are utilities to make 3D images from 2D+depthmap and that is nice, but I was thinking about a way to insert images with alpha at a certain depth too.
For instance if there is a depth at 50% with a grey value of 128,128,128 and I would want to insert an image or text at that depth but with occlusion I would need to check the inserted 128,128,128 greyscale mask and if the depthmap value is higher remove the mask and if it’s lower keep it (and duplicate that erasure at the 2D side), so that it would appear to be at that depth and be occluded by all that was in front.
Now if there was a filter for that it would of course be very handy.

The so-called RGBD images consist of a 2D normal image with to the right a depthmap version I have to add, so I’d need to have the normal 2D clip with alpha and insert that then add the coordinates of half the width and then create and insert a greyscale mask of that 2D clip and then do the occlusion thing.

So I’m trying to figure now how to do that ‘manually’ with scripting, but that requires some searching on all the function available in GMIC to figure that out since I don’t have that readily available in my mind, but assume it’s possible. Can you erase individual pixels on a line by line basis?

Apart from that idea of what I can do with depthmaps there are of course other things, and at this point there isn’t much available in GMIC and in regards to the stereo-image creation and depthmap creation it’s all pretty dated and can’t hope to compete with 2025 NN stuff.

So I hope you coders get inspired after looking into this, and maybe we see some new depthmap-related stuff?
But then again, we all have things that we find interesting and things we don’t care much about, so I can only hope a capable person gets interested.

Whatnot · April 13, 2025, 7:57am

I was musing on my element placement idea while doing other things and now I’m thinking maybe instead of comparing the mask pixel-by-pixel I could just split the depthmap in a single combined bitplane mask of all pixels higher than the desired depth (that is of a higher value) and then subtract the whole thing at once from the mask of the thing I am trying to put at a certain depth.
That might be quicker and less hassle. And there might be a ready-made function in GMIC to do such splitting. Actually… it’s just a simple threshold filter I need right? Duh

Always pays to muse on such things before starting.

Addendum: I guess I could also use the same trick in a video editor, use an offscreen thresholded depthmap as a source for occluding an animated element like text. Maybe soften the edges a smidgen first.

Note that there is an smartphone app for making those depthmaps and on IOS it does video I gather.

Whatnot · April 13, 2025, 8:38am

Come to think of it, you can use such a thresholded depthmap as a roundabout way to extract things from the background.
And then of course you can replace the background.
It’s an old technique but the availability of a decent depthmap (that you can threshold visually in an editor) makes it much simpler.

Reptorian · April 13, 2025, 1:08pm

Make a detailed fake code, and then I’ll try seeing what can be done.

Erase as in setting value to 0 or nan? Yes.

grosgood · April 13, 2025, 3:52pm

Excellent paper. Kudos.

Perhaps a quick cheatsheet on G’MIC compositing can serve many of your image blending purposes, such as making a foreground knock-out mask to modify a so-called middle ground image, one that already has a transparency channel so that it, in turn, composites onto a background image. The general scheme here is: “select from two sources based on a third selection function, implemented by a grayscale mask”

For G’MIC, blending between two images at ratios taken from a third, masking, image is the providence of image (shortcut j) and blend.

image is a so-called “built-in” (C++ implemented within the interpreter), is fast, but a very basic, compositor between a positional “sprite” (thought of as a “floating image”) and a base image, stuck on the ground with the sprite “hovering” over it, and a grayscale “mask” that punches holes in the sprite so that the base image shows through. More formally, the mask serves as a basic compositing function that “selects A, B, or a ratio of the two”.
blend is a more full-featured compositing engine operating on image pairs blended through a wide variety of compositing functions, many familiar (Porter-Duff) and some less so. blend started life as an emulator of GIMP blending functions (in turn based on Photoshop composition functions); it still follows that model but has some blending functions that are not part of GIMP. Here is an enumeration of the G’MIC Blending modes.

This is a (really long!) one-liner masking demo that can be middle-mouse-button-swiped (or Windows cut-and-paste) into a shell (but beware the shell ):

gmic                                                          \
    -input 512,512,1,3                                        \
    [0]x1                                                     \
    -name[-2] pattern                                         \
    -name[-1] gradient                                        \
    -input 100%,100%,100%,1                                   \
    -name. mask                                               \
    -fill[pattern] "xor(x,y)%17<9?[114,159,207]:[250,230,80]" \
    -fill[gradient] "lerp([5,20,50],[10,50,150],y/(w-1))"     \
    -polygon[mask] 5,140,150,400,175,310,440,150,420,105,350,1,1,1,1 \
    -blur[mask] 5%                                            \
    -image[gradient] [pattern],0,0,0,0,1,[mask]               \
    -keep[gradient]                                           \
    -output[gradient] /dev/shm/composite.jpg,80

If the swipe into a command shell works, you should get a log print-out of what G’MIC is doing:

[gmic]./ Start G'MIC interpreter (v.3.5.3).
[gmic]./ Input black image at position 0 (1 image 512x512x1x3).
[gmic]./ Input copy of image [0] at position 1 (1 image 512x512x1x3).
[gmic]./ Set name of image [0] to 'pattern'.
[gmic]./ Set name of image [1] to 'gradient'.
[gmic]./ Input black image at position 2 (1 image 512x512x1x1).
[gmic]./ Set name of image [2] to 'mask'.
[gmic]./ Fill image [0] with expression 'xor(x,y)%17<9?[114,159,207]:[250,230,80]'.
[gmic]./ Fill image [1] with expression 'lerp([5,20,50],[10,50,150],y/(w-1))'.
[gmic]./ Draw 5-vertices filled polygon on image [2], with opacity 1 and color (1,1,1).
[gmic]./ Blur image [2] with standard deviation 5%, neumann boundary conditions and gaussian kernel.
[gmic]./ Draw image [0] at (0,0,0,0) on image [1], with opacity 1 and mask [2].
[gmic]./ Keep image [1] (1 image left).
[gmic]./ Output image [0] as jpg file '/dev/shm/composite.jpg', with quality 80% (1 image 512x512x1x3).
[gmic]./ End G'MIC interpreter.

Every command up to -image lays the table for a three image pipeline. At the end of their run, the image list looks like this:

G’MIC image list

The -image command selects the base image: “gradient”, the first argument, “pattern” references the sprite image. Four coordinates (0,0,0,0) position the upper left corner of the sprite. Four? Yes: G’MIC images are four dimensional, with width, height, slice, and spectral coordinates. An entire animation (or CAT scan) can be represented by a single G’MIC image with lots of slices (animation frames), and each slice can have an arbitrary spectrum of channels, each channel open to wide interpretation. See Image. 1 sets the opacity of the sprite image from the closed interval [0,…,1] — fully transparent ⇒ fully opaque. The last [mask] argument is a single channel, single slice image with pixels also in the closed interval [0,…,1]. The mask image argument is optional; in its absence, the opacity argument provides a constant value function to render the sprite (partially) opaque. At the end of the day, the “gradient” image should look like this and should demonstrate blending of the sprite onto the base per the values in mask:

Results of image command

Your scope of activity lies entirely in the realm of generating masks; At this juncture, I don’t think it is too hard a reach for you to generate a knockout mask from your depth data sets. If I’m wrong, and your attempts at mask generation through G’MIC all crash-and-burn, post what you tried to do here and we’ll try to help.

Aye, there’s the rub: You all can easily imagine what you want to do, but how do those imagined steps map to G’MIC commands? Start Here tries to move people forward from that initial quandry. At any point you can type gmic -h <some command> and get a terse overview of what the command does. Try it on the commands listed in the swipe.

Hope this is enough to get you into serious trouble.

Tobias · April 14, 2025, 9:41am

I was playing with depthmaps and GIMP/G’MIC

jonathanBieler · April 14, 2025, 12:39pm

I also made a short blog post about depth-anything-v2 recently :

Whatnot · April 14, 2025, 7:43pm

Haha, yeah it really got me in trouble trying to figure out the weird result I get from ‘image’ where for some reason the base image becomes negative when I don’t expect it.
But your example worked as described and I experimented with it, I never tried naming layers and although that is an interesting thing it’s a bit annoying that the log it outputs uses the layer number and not the name, so you still have to keep track of the numbers.
I also don’t think the ‘keep’ command in needed in your example and I find using jpg as output also not a good idea, the blur causes horrible artifacts, especially at quality 80 IMHO, you need to go all the way to 98 to avoid it, I much prefer experimenting with PNG files and with the example the PNG is actually close to the same size as the JPEG.

But I’ll keep experimenting, the imagealpha command is much less problematic btw, but I would have to pre-mask to use it.
But maybe I’ll take yet another approach. I’ll ponder on it a bit.

Whatnot · April 14, 2025, 7:49pm

Interesting use cases there jonathanBieler, some of which I had not thought of yet.
Incidentally I tried the depth-anything-v2 output with an android app and the whole app is only 48MB and it works pretty good, so you don’t need a huge setup to try it and can stay well under the 300MB you mention.
I like that you can turn 2D images in convincing 3D ones with the depthmaps, and that works well with AI generated images (too).

Whatnot · April 14, 2025, 7:59pm

Thanks for the offer to help if need be Reptorian, very kind of you.
When trying the ‘image’ command to place things I got weird results and I’m a bit unsure what the definition of ‘sprite’ is in GMIC, I thought it was just a picture with alpha, but maybe I’m wrong?
Also I’m never sure how the various commands react to single bit images and grayscale and alpha or not, I find I need to do experiments aplenty to get some sense of it.
Anyway right now I’m just messing about with commandline and display outputs to see what’s what.
And I got tripped up because I forgot to normalize the images along the way when I should do that, I feel really stupid for forgetting that you need to stay on top of that.

Whatnot · April 14, 2025, 8:03pm

You nicely illustrate how depthmaps can be fun Tobias.
Maybe it’ll inspire some other people to explore this stuff too.

Whatnot · April 14, 2025, 8:16pm

Little general addendum to my original post:
I think that making depth-maps and other such neural network use is a nice use of the current trend in computing while at the same time avoiding the controversial question of AI generated ‘fake’ content.
I think this fall outside that scope and nobody has to question him/herself about using it.
What do you guys think?
I mean there is AI use I do frown upon (removing stuff and inserting stuff in real pictures without us being able to tell) and there is use where I’m slightly annoyed by (the too smooth and somehow boring images that are used).
But normal enlargement (within reason without the excessive smoothing and altering) and noise-canceling and depthmaps and such NN use seem OK to me.