In an article recently published in the journal Scientific Reports, researchers demonstrated a new approach based on learned dynamic phase coding for video reconstruction from a single-motion blurred image.
Background
Modern cameras must become lighter and smaller while maintaining exceptional imaging performance. Thus, innovative design methods attempt to leverage the fundamental limitations of imaging, such as motion blur, as a design advantage to address these conflicting requirements. Motion blur caused by the movement of objects during exposure is a major limitation in dynamic scene photography. The duration of the exposure is set depending on the noise requirements and lighting conditions.
The sensor accumulates the light from moving objects in dynamic scenes in multiple consecutive pixels along the trajectory of the objects, leading to an image blur. Motion deblurring methods are often used for the reconstruction of sharp images by removing this undesirable motion blur effect.
Researchers in this paper utilized this effect to generate video from a single-motion blurred image. The motion blur effect can be exploited in video generation to reconstruct a sharp video frame burst representing the scene at various times during acquisition. However, video generation by exploiting motion blur is extremely difficult as the signal averaging during the acquisition process eliminates motion direction within the captured image.
Although pairwise frames order invariant loss, cascaded generator, and recurrent video autoencoder network can be used to overcome this challenge, the processing stage can assume the motion direction for video reconstruction and cannot effectively address the global direction ambiguity due to the loss of global motion direction during the acquisition process.
Different solutions, such as the capture of multiple frames using different exposures in the acquisition process, replacement of sensors with coded two-bucket, or rolling shutter effect and lens-less imaging can be implemented to address this issue. However, these solutions either require the capturing of several images or are not suitable for standard optical systems.
Video-to-video processing, which involves processing a low frame rate and blurred video into a high frame rate and sharp video, can be realized using frame deblurring followed by frame interpolation to obtain sharp video frames. Earlier end-to-end methods utilized either computational imaging methods, such as event cameras, coded exposure, and flutter shutter, or processing conventional camera videos. However, non-conventional image sensors are expensive and rare, which hinders their applicability on a large scale.
Thus, the limitations of digital methods such as deep learning and conventional imaging, including noise sensitivity and direction ambiguity, and the expensiveness of non-conventional image sensors have necessitated the identification of new approaches to reconstruct video from a single motion-blurred image.
Novel approach for video reconstruction from a single-motion blurred image
In this paper, researchers proposed a computational coded-imaging approach that can be integrated easily into several conventional cameras with a focusing mechanism by adding a phase mask to their lens to overcome the dynamic scene acquisition challenges of conventional cameras.
Thus, the proposed hybrid optical-digital video reconstruction method only requires simple modifications to current optical systems. The coding and reconstruction approach was based on a convolutional neural network (CNN) and a learnable imaging layer, which were jointly optimized end-to-end.
The physical image acquisition process was simulated by the learnable imaging layer using the coded spatiotemporal point spread function (PSF), while the CNN reconstructed the sharp frames from the coded image. During exposure, the joint operation of the learnable focus variation and phase mask generated dynamic phase coding, which encoded the intermediate image scene motion information as chromatic cues.
The cues were generated by the PSF nature of the proposed solution, which encoded the beginning of the object movement in blue and the end of the movement in red. These cues served as guidance for dynamic scene video reconstruction b captured coded image post-processing.
Researchers used learned dynamic phase coding during image acquisition in the lens aperture for motion trajectory encoding, which led to a motion-coded blur that served as the prior information for reconstructing the video. The color motion cues encoded in the acquired image/coded blur during the coding process, along with a relative time parameter (t), were fed to a CNN trained for sharp frame reconstruction at time t in the exposure time. The proposed computational camera reconstructed a sharp scene frame burst at different frame rates from a single coded motion-blurred image using an image-to-video CNN when a sequence of t values was selected.
Significance of the study
The proposed method demonstrated its ability to generate a sharp frame in the exposure interval at any user-controlled time, indicating the method's effectiveness in producing a video burst at user-desired frame rates from a single coded image.
The novel neural architecture used in the method results in a modular and flexible video from the reconstruction of a single image, which could be adjusted easily to any frame rate video by simply changing the neural network parameters that do not need re-training. Additionally, the end-to-end optimization framework for digital and optical processing parameters for dynamic scene acquisition accurately modeled a spatio-temporal acquisition process.
In both real-world and simulation results, the proposed method displayed better performance in inherent direction ambiguity handling, high-quality video reconstruction, lowering noise sensitivity, and maintaining high accuracy compared to conventional imaging-based methods.