Predicting Eye Fixations Using Convolutional Neural Networks

Nian Liu, Junwei Han, Dingwen Zhang, Shifeng Wen, Tianming Liu; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 362-370


It is believed that eye movements in free-viewing of natural scenes are directed by both bottom-up visual saliency and top-down visual factors. In this paper, we propose a novel computational framework to simultaneously learn these two types of visual features from raw image data using a multiresolution convolutional neural network (Mr-CNN) for predicting eye fixations. The Mr-CNN is directly trained from image regions centered on fixation and non-fixation locations over multiple resolutions, using raw image pixels as inputs and eye fixation attributes as labels. Diverse top-down visual features can be learned in higher layers. Meanwhile bottom-up visual saliency can also be inferred via combining information over multiple resolutions. Finally, optimal integration of bottom-up and top-down cues can be learned in the last logistic regression layer to predict eye fixations. The proposed approach achieves state-of-the-art results over four publically available benchmark datasets, demonstrating the superiority of our work.

Related Material

author = {Liu, Nian and Han, Junwei and Zhang, Dingwen and Wen, Shifeng and Liu, Tianming},
title = {Predicting Eye Fixations Using Convolutional Neural Networks},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2015}