Learning by Tracking: Siamese CNN for Robust Target Association

Laura Leal-Taixe, Cristian Canton-Ferrer, Konrad Schindler; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2016, pp. 33-40

Abstract


This paper introduces a novel approach to the task of data association within the context of pedestrian tracking, by introducing a two-stage learning scheme to match pairs of detections. First, a Siamese convolutional neural network (CNN) is trained to learn descriptors encoding local spatio-temporal structures between the two input image patches, aggregating pixel values and optical flow information. Second, a set of contextual features derived from the position and size of the compared input patches are combined with the CNN output by means of a gradient boosting classifier to generate the final matching probability. This learning approach is validated by using a linear programming based multi-person tracker showing that even a simple and efficient tracker may outperform much more complex models when fed with our learned matching probabilities. Results on publicly available sequences show that our method meets state-of-the-art standards in multiple people tracking.

Related Material


[pdf]
[bibtex]
@InProceedings{Leal-Taixe_2016_CVPR_Workshops,
author = {Leal-Taixe, Laura and Canton-Ferrer, Cristian and Schindler, Konrad},
title = {Learning by Tracking: Siamese CNN for Robust Target Association},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2016}
}