Single Image Reflection Removal with Physically-Based Training Images

CVPR 2020 Oral

Single Image Reflection Removal with Physically-Based Training Images Single Image Reflection Removal with
Physically-Based Training Images

by Soomin Kim, Yuchi Huo, and Sung-Eui Yoon

Korea Advanced Institute of Science and Technology (KAIST)

This figure shows the overview of our method structure. First, using the rendering system, we render the image pairs for training as shown in the below figure. From a given reflection overlaid image ( $I$ ), our $SP$ -net first separates I to predicted front scenetransmission, T, and back scene reflection with glass-effects, $\tilde{R}$ . A posteriori loss ( $\L_{pst}$ ) is calculated with each of the predicted values and its ground truth. Our trained backtrack network, $BT$ -net, removes the glass and lens effects of the predicted $\tilde{R}$ into $R$ . Since $R$ is released from complicated glass/lens-effects, we can better capture various image information, resulting in clearer error matching between the predicted image and its ground truth. To utilize this information, we use a new loss, a priori loss ( $\L_{pr}$ ), between $R$ and its ground truth ( $GT$ ). The entire separation network is trained with a loss combination of ( $\L_{pst}$ ) and( $\L_{pr}$ ).

We rendered the reflection training image pairs as shown in this figure. Suppose that we look at the front scene with a camera behind a glass. (1) $I$ is the input image with reflection. (2) $T$ is front scene transmission. (3) $\tilde{R}$ is the reflected back scene (reflection) image with lens/glass-effects, and it is computed by physically simulating the real-world attenuation and glass-effect, i.e., multiple bounces within the glass. (4) $R$ is the back scene (reflection) image without any glass-effects.

Examples of reflection removal results on the SIR wild dataset (Rows 1-3) and our real 100 testset (Rows 4-6) visually.

Abstract

Recently, deep learning-based single image reflection separation methods have been exploited widely. To benefit the learning approach, a large number of training imagepairs (i.e., with and without reflections) were synthesized in various ways, yet they are away from a physically-based direction. In this paper, physically based rendering is used for faithfully synthesizing the required training images, and a corresponding network structure and loss term are proposed. We utilize existing RGBD/RGB images to estimate meshes, then physically simulate the light transportation between meshes, glass, and lens with path tracing to synthesize training data, which successfully reproduce the spatially variant anisotropic visual effect of glass reflection. For guiding the separation better, we additionally consider a module, backtrack network ( $BT$ -net) for backtracking the reflections, which removes complicated ghosting, attenuation, blurred and defocused effect of glass/lens. This enables obtaining a priori information before having the distortion. The proposed method considering additional a priori information with physically simulated training data is validated with various real reflection images and shows visually pleasant and numerical advantages compared with state-of-the-art techniques.

Paper (author preprint)
Source code : Network part (Github) ,
Data generation framework with rendering (Zip file, see the readme file for further information)

by Soomin Kim, Yuchi Huo, and Sung-Eui Yoon

Korea Advanced Institute of Science and Technology (KAIST)

Abstract

Contents