Inpainting a video

bertieb · May 4, 2023, 9:10am

Good day. I am new to using gmic cli, and while I have read some of the basics I have hit a stumbling block which is hopefully simple to overcome.

Context

I am trying to remove an overlay that says ‘BRB’ from a video that I left active by mistake. It might be better fitting to leave it in place as an incentive to my future self to not be careless; but I am interested in how well different removal methods work†.

Inpainting a single image

I have previously used gmic’s patch-based inpaint via GIMP, so I thought I would try that. It works fine for a single image:

gmic logotest.png logotest-redmask.png +inpaint[0] [1] remove[0,1] -output testgmic.jpg
[gmic]-0./ Start G'MIC interpreter.
[gmic]-0./ Input file 'logotest.png' at position 0 (1 image 1920x1080x1x3).
[gmic]-1./ Input file 'logotest-redmask.png' at position 1 (1 image 1920x1080x1x4).
[gmic]-2./ Inpaint image [0] masked by image [1], with high-connectivity average algorithm.
[gmic]-3./ Remove images [0,1] (1 image left).
[gmic]-1./ Output image [0] as jpg file 'testgmic.jpg', with quality 100% (1 image 1920x1080x1x3).
[gmic]-1./ End G'MIC interpreter.

I have yet to try the parameters in the command reference.

Inpainting a video

It seemed to me this would work since gmic works with videos, and indeed the output suggests it should:

$ gmic delogotest.mkv logotest-redmask.png +inpaint[0,-2] [-1] remove[-1] -output testgmic.mkv,60,h264
[gmic]-0./ Start G'MIC interpreter.
[gmic]-0./ Input all frames of file 'delogotest.mkv' at position 0 (837 images [0] = 1920x1080x1x3, (...),[836] = 1920x1080x1x3).
[gmic]-837./ Input file 'logotest-redmask.png' at position 837 (1 image 1920x1080x1x4).
[gmic]-838./ Inpaint images [0,836] masked by image [837], with high-connectivity average algorithm.
[gmic]-840./ Remove image [839] (839 images left).
[gmic]-839./ Output images [0,1,2,(...),836,837,838] as mkv file 'testgmic.mkv', with 60 fps and h264 codec.
[gmic]-839./ End G'MIC interpreter.

However, at a glance the output seems to be the input (though I think it is not).

Inspecting further without removal and without file output:

$ gmic delogotest.mkv logotest-redmask.png +inpaint[0,-2] [-1]                  
[gmic]-0./ Start G'MIC interpreter.                                                                                 
[gmic]-0./ Input all frames of file 'delogotest.mkv' at position 0 (837 images [0] = 1920x1080x1x3, (...),[836] = 19
20x1080x1x3).                                             
[gmic]-837./ Input file '/home/robert/downloads/logotest-redmask.png' at position 837 (1 image 1920x1080x1x4).      
[gmic]-838./ Inpaint images [0,836] masked by image [837], with high-connectivity average algorithm.
[gmic]-840./ Display images [0,1,2,(...),837,838,839] = 'delogotest.mkv, (...), delogotest_c837.mkv'.

<snip>

[837] = 'logotest-redmask.png':                                                              
  size = (1920,1080,1,4) [31 Mio of float32].                                                                       
  data = (0,0,0,0,0,0,0,0,0,0,0,0,(...),0,0,0,0,0,0,0,0,0,0,0,0).
  min = 0, max = 255, mean = 7.10836, std = 41.9774, coords_min = (0,0,0,0), coords_max = (1154,743,0,0).
[838] = 'delogotest_c1.mkv':                                                                                        
  size = (1920,1080,1,3) [23 Mio of float32].                                                                       
  data = (39,118,131,127,120,117,105,109,122,141,146,144,(...),13,8,4,6,3,11,18,24,8,23,19,11).
  min = 0, max = 255, mean = 65.443, std = 41.7933, coords_min = (958,7,0,0), coords_max = (1012,0,0,0).
[839] = 'delogotest_c837.mkv':                                                                                      
  size = (1920,1080,1,3) [23 Mio of float32].                                                                       
  data = (39,118,131,127,120,117,105,109,122,141,146,144,(...),11,10,7,9,4,10,13,22,8,24,17,8).
  min = 0, max = 255, mean = 70.2562, std = 43.1253, coords_min = (871,38,0,0), coords_max = (436,2,0,0).

The last two frames seem to have inpainting applied, and the third last is the mask. The others seem to be the original unchanged frames.

Q: Where have I made a mistake with video input / applying inpaint / remove-ing frames?

Follow up questions: While I am testing on a short section of video (~13s) I would like to see if I can apply this to a much longer segment (~30 min!). 1) Is there a way to tell gmic to operate sequentially on a single frame at a time, rather than reading all frames in? 2) If not, is there a way to make gmic operate on a named pipe? I was hoping to do <video input> → ffmpeg → <image frames> → named pipe → gmic → <image frames> → second ffmpeg process → <video output>

I’ve tried 2) but not yet gotten it to work, and I haven’t found any references to anyone doing this when I searched!

Thanks in advance for advice. The documentation and tutorial are great – my favourite part so far is the fake depth of field on the Moai heads – and probably answer my question, but I haven’t quite gotten there yet!

†: the video is named delogotest.mkv as I first tried the delogo filter of ffmpeg, even with complex geometry (read: a bunch of rectangles covering the letter lines) the results were not brilliant; though obviously it wasn’t designed for cases like this!

bertieb · May 4, 2023, 11:17am

‘Simple to overcome’, indeed

After having a walk and re-reading the command items and selections part of the docs, I have spotted two basic issues:

use of +: I should call inpaint without that so that it operates in-place, I think
[0, -2] means two inputs: the first and penultimate. I should use [0--2] to specify the range instead

Inpainting a video does indeed work

$ gmic delogotest.mkv,0,120 logotest-redmask.png inpaint[0--2] [-1] remove. -output testgmic
3.webm,60,vp09                                                                                                      
[gmic]-0./ Start G'MIC interpreter.                                                                                 
[gmic]-0./ Input frames 0...120:1 of file 'delogotest.mkv' at position 0 (121 images [0] = 1920x1080x1x3, (...),[120
] = 1920x1080x1x3).                                                                                                 
[gmic]-121./ Input file 'logotest-redmask.png' at position 121 (1 image 1920x1080x1x4).                             
[gmic]-122./ Inpaint images [0,1,2,(...),118,119,120] masked by image [121], with high-connectivity average algorithm.                                                                                                                  
[gmic]-122./ Remove image [121] (121 images left).                                                                  
[gmic]-121./ Output images [0,1,2,(...),118,119,120] as webm file 'testgmic3.webm', with 60 fps and vp09 codec.OpenC
V: FFMPEG: tag 0x39305056/'VP09' is not supported with codec id 167 and format 'webm / WebM'
                                                                                                                    
[gmic]-121./ End G'MIC interpreter

This produces a matroska VP9 file, despite the complaint. It is rather large at 20MB (!), so I instead encoded to MP4/h264 (which also elicited a complaint):

However, I still think this reads everything in before operating. Perhaps I would do well to read and understand -repeat ... -done.

grosgood · May 4, 2023, 11:29am

This may not be a direct answer to your needs, but it happens to furnish a great deal of insight for those transitioning to working with Inpaint from the CLI:

Will try to give you a more directed answer later in the (North American East Coast) morning.

bertieb · May 4, 2023, 11:36am

Thank you for that resource

At the moment I have inpaint running with:

Inpaint images [0,1,2,(...),118,119,120] masked by image [121], with patch size 7, lookup size 16, lookup factor 0.12, lookup_increment 1, blend size 2, blend threshold 0, blend decay 0.05, 10 blend scales and outer blending enabled

as those values produced reasonable results in the tests I ran on one frame. I have read the inestimable and esteemed @patdavid blog post so oft referred to; but while I know the individual words, I don’t grok the meaning intuitively as yet. I reckon the linked thread will help that though.

That said, practical considerations mean that I think I will have to use one of the fast variants, as in the time it’s taken me to type and retype this reply, the command has still not finished running on the two second clip- I shudder to think the amount of time it would take to process 30 minutes’ worth or circa 108k individual frames!

Edit: Did I say “reasonable results” ? I meant to say “much worse results”

By ‘worse’ I mean no criticism of inpaint; that invocation was slow and troublesome but that’d down to me, not the command.

grosgood · May 4, 2023, 1:27pm

Patching animations forces one quicky to the KISS principle: “Keep It Simple…” (various final names apply). That’s because you must patch in the depth direction as well. Think in terms of patching one volume in a consistent fashion, rather than 837+ separate surfaces, where it is quite a bit harder to maintain consistentcy from image frame to image frame.

If you adopt that point of view, then you will probably decide pretty quickly that Inpaint isn’t animation friendly. You may discover a set of Inpaint parameters that magickly patch one frame; half a dozen frames later the solution breaks down. That’s because Inpaint is solving 837 separate 2D infilling problems; there is no frame-to-frame coherency in this approach. You have 837 frame-specific solutions that don’t match up. You’ve seen that.

I would first suggest considering your patch problem in a volumetric way.

Load your .mkv as you have done. This puts 830+ discrete, 2D frames on your image stack.
Then append your animation frames in the depth direction:

…append z…

Your 830+ discrete frames becomes one image volume.

Many of the more basic, general, purpose G’MIC commands operate volumetrically. See Cauldron, Anyone? for a tangental discussion on blurring in three dimensions. Many other commands also operate in three dimensions, or have depth-specific modes.

A sketch of the solution now may just entail filling the cube that snaps around the entire BRB graphic — in three dimensions — with a color that roughly matches the map, then volumetrically blurring the cube so that the cross-border variance isn’t so sharp. Then:

…split z …

back to 830 or so discrete frame and output as before.

Sorry I don’t have a specific script — it’s a working day and I have meetings coming up. Perhaps this evening (8+ hours from now) when the dust settles for me. I just wanted to pull volumetric thinking to center stage because, roughly, that seems to me where your solution lies. You’ll have to be a bit creative; you’re probably good at that, and other people around here may chime in.

bertieb · May 4, 2023, 3:38pm

Thank you for taking the time to think and reply again.

Keeping it simple

I quite agree! I’ve been limiting my efforts to short clips because doing anything with animations or videos incurs a time for encoding, never mind any resource-intensive calculations commands might perform.

The actual duration of video, were I to attempt removal, is about 30 minutes; which is quite beyond what gmic can fit in memory if my tests are any indication, having had the process OOM killed even when working on clips in the handfuls of seconds.

So I’ve been keeping it to the simple and fast (delogo, fast variants of inpaint) and the results are interesting, at least to me:

I am still trying to sprint before I can amble though. I’d like to figure out how to do a few mundane things:

fast/simple blur limited to a mask, as I think the fast inpaint variants would look decent with a
operate only on a part of the image, as the inpaint invocation only needs to concern itself with the lower-right quadrant.
work on a single frame at a time, eg either via named pipe or stdout. I realise this is the exact antithesis of your suggestion, but for speed…

Videos are ‘just’ 3d images!

That is interesting and quite worth consideration!

Once I wrap my head around the concept, I may try what you suggest - combining, colouring, blurring, uncombining - and see if there’s something that looks good. Food for thought!

snibgo · May 4, 2023, 5:10pm

… quite beyond what gmic can fit in memory if my tests are any indication, having had the process OOM killed even when working on clips in the handfuls of seconds.

You might crop the video to just the area that contains the overlay to be removed. Similarly, instead of considering the volume of 100000 frames, you might break this into overlapping batches of, say, 1000 frames each, and then do a blended merge of batches so you don’t get sudden transitions at the boundaries.

bertieb · May 4, 2023, 6:23pm

Good idea!

My plan, if I can’t find a convenient way to operate on a subset / region / range of an image, is to crop→process→overlay; hopefully the speed gain from inpainting on a smaller area outweighs the slowdown from crop+overlay.

Good point!

grosgood · May 4, 2023, 10:18pm

Here goes:
Before:

After:

Implementations:

Add & Remove Watermarks

mkanim : -check ${"is_image_arg\ $1"} -skip ${2=30},${3=0.005},${4=0.05}
   fc,lb,hb=${2-4}
   -expand_z. $fc
   -noise. 1%,2
   -bandpass. $lb,$hb
   -normalize. 0,255
   -split. z
   -pass$1 1
   -name. watermark
   -foreach[^watermark] {
       -pass[-1] 1
       -blend[-2] [-1],alpha,1,0
       -rm.
       }
   -remove[watermark]

rmwmrk : -check ${"is_image_arg\ $1"}
    -permute. xycz
    -split. z
    -name[^] anim
    -pass$1 1
    -name. msk
    -normalize[msk] 0,1
    -foreach[^msk] {
       -pass[-1]
       -inpaint_morpho[0] [-1]
       -remove[-1]
    }
    -append z
    -permute xycz

# gmic patchwmk.gmic -input curdle.mp4 append z -input ampersand_mask.png v + rmwmrk.. [-1] remove. normalize. 0,255  split. z o curdle_patched.mp4,24,h264

# gmic patchwmk.gmic 256,256,1,3 name. anim -i /dev/shm/ampersand.png name. watermark v + mkanim[anim] [-1],30,0.01,0.1 -remove[watermark] o curdle.mp4,24,h264

Some auxiliary images:

This is a watermark for screwing up all your frames. Of course. Thank me profusely. Everybody who works with me has to take possession of a 1911 Goudy Bookletter Ampersand. That’s just the way it is around here.
ampersand_mask
This is the mask to remove the watermark.

Howzit works:

mkanim — Not necessary for the solution: it just makes a watermarked animation for testing purposes. It is based on Cauldron with some modification to stamp the ampersand on every frame. Nothing special. It just brings me up to speed with you; I now have a watermark-polluted animation of my very own. How nice.

Continued next post. Phone is ringing and I got to answer…

grosgood · May 4, 2023, 10:56pm

rmwmrk — This is what you want. The game is to turn the animation into a volume, and — secret sauce is coming — permute the volume, switching the slices with channels, and channels with slices. This is primarily so that inpaint_morpho can operate on pixels that happen to have lots of channels, but it generally doesn’t care about lots of channels, and the inpaint-fill it devises automatically handles frame-to-frame coherency, because what used to be frames are now channels.
I found it necessary to split the permuted animation along the z-axis; inpaint_morpho seems to have trouble operating on multiple slice-and-channel images. The foreach[^msk] {…} loop (iterate over every image on the stack except mask) first processes the red data animation, then the green data animation, finally the blue data animation. Then we back out: append along z and permute backwards.

This pipeline generates a test animation:

gmic patchwmk.gmic 256,256,1,3 name. anim -i ampersand.png name. watermark mkanim[anim] [-1],30,0.01,0.1 -remove[watermark] o curdle.mp4,24,h264

This pipeline eliminates (almost all of it) the watermark:

 gmic patchwmk.gmic -input curdle.mp4 append z -input ampersand_mask.png rmwmrk.. [-1] remove. normalize. 0,255  split. z o curdle_patched.mp4,24,h264

@snibgo advice on carving out just the corner that needs patching is excellent. However, inpaint_morpho needs a little live area around the watermark so it has regions to swipe from. I surprised myself with using inpaint_morpho after bad-mouthing inpaint, but it does such a nice job after you trick it up to see depth as channels. And it doesn’t require a lot of set-up. Slow, yes. Try not to feed it too much x,y area.

Not sure how much you know about G’MIC, but if there are pieces of code that seem especially obscure, just ask. Otherwise, I think you’ve got a base to hack around with.

Have fun!