Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition

Sibo Song, Vijay Chandrasekhar, Bappaditya Mandal, Liyuan Li, Joo-Hwee Lim, Giduthuri Sateesh Babu, Phyo Phyo San, Ngai-Man Cheung; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2016, pp. 24-31

Abstract


In this paper, we propose a multimodal multi-stream deep learning framework to tackle the egocentric activity recognition problem, using both the video and sensor data. First, we experiment and extend a multi-stream Convolutional Neural Network to learn the spatial and temporal features from egocentric videos. Second, we propose a multi-stream Long Short-Term Memory architecture to learn the features from multiple sensor streams (accelerometer, gyroscope, etc.). Third, we propose to use a two-level fusion technique and experiment different pooling techniques to compute the prediction results. Experimental results using a multimodal egocentric dataset show that our proposed method can achieve very encouraging performance, despite the constraint that the scale of the existing egocentric datasets is still quite limited.

Related Material


[pdf]
[bibtex]
@InProceedings{Song_2016_CVPR_Workshops,
author = {Song, Sibo and Chandrasekhar, Vijay and Mandal, Bappaditya and Li, Liyuan and Lim, Joo-Hwee and Sateesh Babu, Giduthuri and Phyo San, Phyo and Cheung, Ngai-Man},
title = {Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2016}
}