Sense Discovery via Co-Clustering on Images and Text

Xinlei Chen, Alan Ritter, Abhinav Gupta, Tom Mitchell; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5298-5306

Abstract


We present a co-clustering framework that can be used to discover multiple semantic and visual senses of a given Noun Phrase (NP). Unlike traditional clustering approaches which assume a one-to-one mapping between the clusters in the text-based feature space and the visual space, we adopt a one-to-many mapping between the two spaces. This is primarily because each semantic sense (concept) can correspond to different visual senses due to viewpoint and appearance variations. Our structure-EM style optimization not only extracts the multiple senses in both semantic and visual feature space, but also discovers the mapping between the senses. We introduce a challenging dataset (CMU Polysemy-30) for this problem consisting of 30 NPs ($\sim$5600 labeled instances out of $\sim$22K total instances). We have also conducted a large-scale experiment that performs sense disambiguation for $\sim$2000 NPs.

Related Material


[pdf]
[bibtex]
@InProceedings{Chen_2015_CVPR,
author = {Chen, Xinlei and Ritter, Alan and Gupta, Abhinav and Mitchell, Tom},
title = {Sense Discovery via Co-Clustering on Images and Text},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2015}
}