Diffraction- and Reflection-Aware Multiple Sound Source Localization

by Inkyu An, Youngsun Kwon, and Sung-Eui Yoon
IEEE Transactions On Robotics (T-RO) 2021

Robot, equipped with a cube-shaped microphone array, localizes a source position in a 3-D space. Our formulation takes into account both direct and indirect sound propagation, given its use of acoustic rays. The acoustic rays are initialized and propagated based on our backward acoustic ray tracing algorithm that considers reflection and diffraction; primary, reflection, and diffraction acoustic rays are shown in white, blue, and red lines, respectively. The yellow disk, which is very close to the ground truth, represents a 95% confidence ellipse with regard to the estimated sound source, as computed by our approach.

Run-time computations using acoustic ray tracing for sound source localization. Acoustic ray tracing is performed from DoAs, a mesh map containing wedges, and a robot position where a DoA estimator works on a cube-shaped eight-microphone array. The robot position is estimated by 2-D SLAM from a 2-D Lidar sensor, and the mesh map and wedges are generated during the precomputation phase. Source position estimation is performed by identifying ray convergence from the generated acoustic ray paths.

The examples of acoustic ray tracing handling reflection (left) and diffraction (right).
Left: Example of propagating reflection acoustic rays. The acoustic ray path containing direction and reflection acoustic rays from r^0_n to r^k_n is propagated from the origin \dat{o} of the microphone array to the red point corresponding to r^k_n(l). The summation of all ray lengths l of each acoustic ray from r^0_n to r^k_n should be identical to l_max.
Right: Our acoustic ray tracing method devised to handle the diffraction effect. Suppose that we have an acoustic ray r^{k-1}_n satisfying the diffraction condition, hitting or passing near the edge of a wedge. We then generate N_d diffraction rays covering the possible incoming directions (especially, in the shadow region) of rays that cause the diffraction.

(a) (b)

Example of performing the pth particle filter at the first and second iterations, i.e., t = 0 and t = 1. At the beginning of our approach, i.e., t = 0, particles are initialized based on the uniform distribution in (a). In the weight computation part (b), weights of particles are computed, given acoustic ray paths; particles have higher weights when they are located near the convergence region of ray paths. In the resampling path (c), particles with low weights are resampled close to particles with high weights. Thanks to the resampling part, particles can be moved to the convergence region of ray paths. After executing the part of allocating ray paths (Section IV-D), the first iteration of our approach is finished. At the second iteration, i.e., t = 1, the Monte Carlo localization starts with the sampling part, and particles are regenerated based on the Gaussian distribution in (d).

This multimedia material shows experimental videos of our approach, Diffraction and Reflection Aware Sound Source Localization, in real environments. The experimental videos of the four scenarios in our paper are the challenging environments to localize source positions. These scenarios contain multiple sources and obstacles blocking direct sound propagation paths while the sound sources move around obstacles; the moving source becomes non-line-of-sight sources when they are located behind the obstacles. In the last experimental video, the robot equipping the microphone array performs the task of navigating to the non-line-of-sight source. This multimedia material shows that our approach can handle these difficulties.

Abstract

In this article, we present a novel localization method for multiple sources in indoor environments. Our approach can estimate different propagation paths, including the reflection and diffraction paths of sound waves based on a backward ray tracing technique. To estimate diffraction propagation paths, we combine a ray tracing algorithm with a uniform theory of diffraction model by exploiting the diffraction properties as propagation paths bend around the wedges of obstacles. We reconstruct the 3-D environments and wedges of obstacles in the precomputation phase and utilize these outcomes to generate primary, reflection, and diffraction acoustic rays in the runtime phase. We localize multiple sources when identifying the convergence regions of these acoustic rays based on Monte Carlo localization (MCL). Our approach supports not only stationary but also moving sources of human speech and clapping sounds. Our approach can also handle nonline-of-sight (NLOS) sources and distinguish between active and inactive surce states. We evaluated and analyzed our algorithm in multiple scenarios containing obstacles and NLOS sources. Our approach can localize moving sources with the average of distance errors of 0.65 and 0.74 m in single and multiple source cases, respectively, in rooms, 7 m by 7 m in size with a height of 3 m; errors are measured according to the L2 distance between the estimated and actual source positions. We observed a 130% improvement of the localization accuracy over the prior work (J.-M. Valin et al.).

Paper: PDF, early access version
Related work: Reflection-Aware Sound Source Localization, ICRA 2018 and Diffraction-Aware Sound Localization for a Non-Line-of-Sight Source, ICRA 2019

Dept. of Computer Science
KAIST
373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701
South Korea
sglabkaist dot gmail dot com

by Inkyu An, Youngsun Kwon, and Sung-Eui Yoon IEEE Transactions On Robotics (T-RO) 2021

Abstract

Contents

by Inkyu An, Youngsun Kwon, and Sung-Eui Yoon
IEEE Transactions On Robotics (T-RO) 2021