A very cool project/paper that I’ve found so I thought I’d share it here.
Video frame interpolation aims to synthesize non-existent frames in-between the original frames. While significant advances have been made from the deep convolutional neural networks, the quality of interpolation is often reduced due to large object motion or occlusion. In this work, we propose to explicitly detect the occlusion by exploring the depth cue in frame interpolation. Specifically, we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. In addition, we learn hierarchical features as the contextual information. The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame. Our model is compact, efficient, and fully differentiable to optimize all the components. We conduct extensive experiments to analyze the effect of the depth-aware flow projection layer and hierarchical contextual features. Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets.