The 36th British Machine Vision Conference 2025

AFB
Learning Event-guided Exposure-agnostic Video Frame Interpolation via Adaptive Feature Blending

Junsik Jung, Yoonki Cho, Woo Jae Kim, Lin Wang, and Sung-Eui Yoon

Korea Advanced Institute of Science and Technology (KAIST), Nanyang Technological University (NTU)


[Paper] [Code] [Poster] [Video]

Abstract

Exposure-agnostic video frame interpolation (VFI) is a challenging task that aims to recover sharp, high-frame-rate videos from blurry, low-frame-rate inputs captured under unknown and dynamic exposure conditions. Event cameras are sensors with high temporal resolution, making them especially advantageous for this task. However, existing event-guided methods struggle to produce satisfactory results on severely low-frame-rate blurry videos due to the lack of temporal constraints. In this paper, we introduce a novel event-guided framework for exposure-agnostic VFI, addressing this limitation through two key components: a Target-adaptive Event Sampling (TES) and a Target-adaptive Importance Mapping (TIM). Specifically, TES samples events around the target timestamp and the unknown exposure time to better align them with the corresponding blurry frames. TIM then generates an importance map that considers the temporal proximity and spatial relevance of consecutive features to the target. Guided by this map, our framework adaptively blends consecutive features, allowing temporally aligned features to serve as the primary cues while spatially relevant ones offer complementary support. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our approach in exposure-agnostic VFI scenarios.


Motivation

(a) Recent method restores sharp frames from a single blurry input via event modulation, but degrades as the target timestamp deviates due to the lack of temporal constraints.

(b) Our method addresses this by effectively leveraging temporal constraints via adaptive feature blending.

(c) PSNR comparison on GoPro dataset shows that our method maintains more stable performance across varying timestamps.


Method

Overview of our framework. Given blurry frames (I0, I1) with unknown exposures (Te0, Te1), stacked events EN over 2T, and target timestamp τ, the model reconstructs the target frame Îτ. TES samples events around τ and unknown exposure, which are fused with frames. TIM generates an importance map ωτ to adaptively blend the fused features (F0, F1) into Fτ, which is then decoded into Îτ. Shared encoders are color-coded.


Qualitative Results

Our method can upscale input blurry frames into arbitrary temporal scales.

Quantitative Results