Our approach generates direct and indirect acoustic ray paths and localizes the sound source while considering back-propagation signals on generated acoustic ray paths.
The back-propagation signals are virtually computed signals the could be heard at particular locations and computed by using impulse responses.
When two back-propagation signals of acoustic ray paths are highly correlated, we treat them to be originated form the same source.
Our sound source localization algorithm utilizes signals, called back-propagation signals, that are back-propagated to particular locations on sound propagation paths from signals measured at the microphone.
Our method expresses a surrounding environment in form of a mesh map, which is reconstructed from the point cloud collected by the depth sensor.
At runtime, audio streams are collected by a 32 channel microphone array.
After localizing incoming directions of sound using a EB-MVDR (Eibenbeam-minimum variance distortionless response) based beamformer algorithm, we estimate acoustic signals observed from major incoming directions by applying beam patterns.
For simulating acoustic paths, we generate direct and reflection acoustic rays by applying ray tracing in the backward manner [Related work 1].
Specifically, we generate direct acoustic rays in the opposite directions to those of incoming sounds.
Once these direct acoustic rays intersect with the surrounding environment, we generate reflection acoustic rays to reversely simulate the reflection effect.
Finally, we perform the Monte Carlo localization algorithm for identifying a source position from the generated acoustic paths.
If these acoustic rays are actually coming from the same sound source, back-propagation signals at a candidate location should be similar to each other.
We therefore utilize those back-propagation signals of acoustic rays at a candidate location as an important factor of identifying the sound source location.
This back-propagation signal of an acoustic path is computed by the impulse response that is initialized with the separation signal estimated for each direct acoustic ray.
The test environment w/ and w/o an obstacle that can make the non-line-of-sight sound source. We use the clapping sound in the sound source.
We put an additional noise (67 dB and 77 dB white noises) as the distractor in the back of the test environments.
Please refer to the paper and the working video for the results.