Fast Zero-Shot Image Tagging

Yang Zhang, Boqing Gong, Mubarak Shah; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5985-5994


The well-known word analogy experiments show that the recent word vectors capture fine-grained linguistic regularities in words by linear vector offsets, but it is unclear how well the simple vector offsets can encode visual regularities over words. We study a particular image-word relevance relation in this paper. Our results tell that, given an image, its relevant tags' word vectors rank ahead of the irrelevant tags' along a principal direction in the word vector space. Inspired by this observation, we propose to solve image tagging by estimating the principal direction for an image. Particularly, we exploit linear mappings and nonlinear deep neural networks to approximate the principal direction from an input image. We arrive at a quite versatile tagging model. It runs fast given a test image, in constant time w.r.t. the training set size. It not only gives rise to superior performance for the conventional tagging task on the NUS-WIDE dataset, but also outperforms competitive baselines on annotating images with previously unseen tags. To this end, we name our approach fast zero-shot image tagging (Fast0Tag) to recognize that it possesses the advantages of both FastTag (Chen et al. 2013) and zero-shot learning.

Related Material

author = {Zhang, Yang and Gong, Boqing and Shah, Mubarak},
title = {Fast Zero-Shot Image Tagging},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}