Person Tracking Using Audio and Depth Cues

Qingju Liu, Teofilo de Campos, Wenwu Wang, Philip Jackson, Adrian Hilton; Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, 2015, pp. 22-30


In this paper, a novel probabilistic Bayesian tracking scheme is proposed and applied to bimodal measurements consisting of tracking results from the depth sensor and audio recordings collected using binaural microphones. We use random finite sets to cope with varying number of tracking targets. A measurement-driven birth process is integrated to quickly localize any emerging person. A new bimodal fusion method that prioritizes the most confident modality is employed. The approach was tested on real room recordings and experimental results show that the proposed combination of audio and depth outperforms individual modalities, particularly when there are multiple people talking simultaneously and when occlusions are frequent.

Related Material

author = {Liu, Qingju and de Campos, Teofilo and Wang, Wenwu and Jackson, Philip and Hilton, Adrian},
title = {Person Tracking Using Audio and Depth Cues},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {December},
year = {2015}