Yes, I would look at channels separately and likely in hsv/hsl colorspace (using RGBToHSL nodes).
While machine learning is all the rage I’m not that thrilled. The sample linked does look significantly better, but I’d love to see a comparison with all the non-ai made adjustments rather than the untouched base. Color correction alone would have a large subjective impact. I don’t have enough experience to guess if the fluid movement and better interlacing could not have been achieved otherwise just as easily.
Admittedly, this might just be my bias since I’m used to sneer at/turn off most 'AI something’s manufacturers promote their stuff with nowadays.
I don’t exactly know how my filter is supposed to look like in the time-based stage. Since I need to put my thoughts into order anyway, here goes:
The basic idea is to have a matrix m[source_index, frame_index, frame_width, frame_height, channel] = value
then compute something like median_lum(n,x,y) = median(m(:,[n-1, n, n+1],x,y, hsl.L))).
Choosing the set of applicable frames (ie [n-1, n], [n-1, n, n+1], [n, n+1] or even just [n]) requires playing around to see what works for my material.
If I had two Natron node parameters TIME_THRESHOLD (that decides by looking at the difference between differences) and a helper variable COMPARISON_BLUR_STRENGTH I would start by doing it like this:
-defining 3 sets of frames grouped by frame time: n-1, n, n+1
-blurring each frame in a set (using COMPARISON_BLUR_STRENGTH)
-summing up the values of each frame per channel and normalizing with respect to pixel_maximumframe_widthheight
-taking the set-median over all normalized sums in the set
-computing differences between set-medians:
diff_prev = absdiff(set-median(n-1), set-median(n))
diff_next = absdiff(set-median(n), set-median(n+1))
diff_diff = diff(diff_prev, diff_next)
- now there are 3 cases:
diff_diff < TIME_THRESHOLD: take all 3 sets as input for next step
diff_diff > 0: take current and next set
diff_diff <= 0: take current and previous set
returning, for instance, median_filtering_set = [previous, current]
All of this is just to determine if frames are applicable to be added to the median filtering set.
From now on I’d look at the original (non-blurred) frames:
//the : denotes the complete set of applicable frames, instead of just a singular one
output_frame(n, x, y, channel_index) = median(median_filtering_set(:, x, y, channel_index))
Doing it with separate node-groups for each channel might be prudent in case there are memory concerns with that much data? It’s likely not for my use-case.
There are several potential pitfalls I can see so far. As johnmeyer (forum.videohelp.com/members/13415-johnmeyer) pointed out - if frames are not well aligned between different captures then I’d have to deal with this on a frame-by-frame basis rather than considering whole sets.
Another issue could occur if my blur-sum-compare approach is too rough. Maybe I’d need to do this on smaller frame-chunks.
If the scene is too dynamic I might have to remove both previous and next sets and/or start tracking the scene and work with partial overlaps. No sense in starting with that however.
As a lesser concern - feeding a median function with an even number of inputs might be trouble (dropping one frame from the next-set would be the quick+dirty solution I can think of).
Hopefully the whole thing is robust but since it looks at any one given image up to three times there might be too much weight spread out resulting in a time-averaged smear of some kind.
Very likely I’ll need to fiddle. Probably quite a lot ^^