Pattern Recognition Letters

Microphone Pair Training for Robust Sound Source Localization With Diverse Array Configurations Multi-Resolution Distillation for Self-Supervised Monocular Depth Estimation

by Sebin Lee, Woobin Im, Sung-Eui Yoon

Abstract

Obtaining dense depth ground-truth is not trivial, which leads to the introduction of self-supervised monocular depth estimation. Most self-supervised methods utilize the photometric loss as the primary supervisory signal to optimize a depth network. However, such self-supervised training often falls into an undesirable local minimum due to the ambiguity of the photometric loss. In this paper, we propose a novel self-distillation training scheme that provides a new self-supervision signal, depth consistency among different input resolutions, to the depth network. We further introduce a gradient masking strategy that adjusts the self-supervision signal of the depth consistency during back-propagation to boost the effectiveness of our depth consistency. Experiments demonstrate that our method brings meaningful performance improvements when applied to various depth network architectures. Furthermore, our method outperforms the existing self-supervised methods on KITTI, Cityscapes, and DrivingStereo d atasets by a noteworthy margin.

Downloads

Paper