Pedestrian Detection Aided by Deep Learning Semantic Tasks

Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5079-5087


Deep learning methods have achieved great success in pedestrian detection, owing to its ability to learn features from raw pixels. However, they can mainly capture middlelevel representations, such as pose of pedestrian, but confuses positive with hard negative samples (Fig.1 (a)), which have large ambiguity and can only be distinguished by highlevel representation. To address this ambiguity, this work jointly optimize pedestrian detection with semantic tasks, including pedestrian attributes (e.g. 'carrying backpack') and scene attributes (e.g. 'vehicle', 'tree', and 'horizontal'). Rather than expensively annotating scene attributes, we transfer attributes information from existing scene segmentation datasets to the pedestrian dataset, by proposing a novel deep model to learn high-level features from multiple tasks and multiple data sources. Since distinct tasks have distinct convergence rates and data from different datasets have different distributions, a multi-task objective function is carefully designed to coordinate tasks and reduce discrepancies among datasets. The importance coefficients of tasks and network parameters in this objective function can be iteratively estimated. Extensive evaluations show that the proposed approach outperforms the state-of-the-art on the challenging Caltech [10] and ETH [11] datasets where it reduces the miss rates of previous deep models by 17 and 5.5 percent, respectively.

Related Material

author = {Tian, Yonglong and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
title = {Pedestrian Detection Aided by Deep Learning Semantic Tasks},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2015}