Bilinear Heterogeneous Information Machine for RGB-D Action Recognition

Yu Kong, Yun Fu; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1054-1062

Abstract


This paper proposes a novel approach to action recognition from RGB-D cameras, in which depth features and RGB visual features are jointly used. Rich heterogeneous RGB and depth data are effectively compressed and projected to a learned shared space, in order to reduce noise and capture useful information for recognition. Knowledge from various sources can then be shared with others in the learned space to learn cross-modal features. This guides the discovery of valuable information for recognition. To capture complex spatiotemporal structural relationships in visual and depth features, we represent both RGB and depth data in a matrix form. We formulate the recognition task as a low-rank bilinear model composed of row and column parameter matrices. The rank of the model parameter is minimized to build a low-rank classifier, which is beneficial for improving the generalization power. The proposed method is extensively evaluated on two public RGB-D action datasets, and achieves state-of-the-art results. It also shows promising results if RGB or depth data are missing in training or testing procedure.

Related Material


[pdf]
[bibtex]
@InProceedings{Kong_2015_CVPR,
author = {Kong, Yu and Fu, Yun},
title = {Bilinear Heterogeneous Information Machine for RGB-D Action Recognition},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2015}
}