Weakly Supervised Learning for Attribute Localization in Outdoor Scenes

Shuo Wang, Jungseock Joo, Yizhou Wang, Song-Chun Zhu; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 3111-3118

Abstract


In this paper, we propose a weakly supervised method for simultaneously learning scene parts and attributes from a collection of images associated with attributes in text, where the precise localization of the each attribute left unknown. Our method includes three aspects. (i) Compositional scene configuration. We learn the spatial layouts of the scene by Hierarchical Space Tiling (HST) representation, which can generate an excessive number of scene configurations through the hierarchical composition of a relatively small number of parts. (ii) Attribute association. The scene attributes contain nouns and adjectives corresponding to the objects and their appearance descriptions respectively. We assign the nouns to the nodes (parts) in HST using nonmaximum suppression of their correlation, then train an appearance model for each noun+adjective attribute pair. (iii) Joint inference and learning. For an image, we compute the most probable parse tree with the attributes as an instantiation of the HST by dynamic programming. Then update the HST and attribute association based on the inferred parse trees. We evaluate the proposed method by (i) showing the improvement of attribute recognition accuracy; and (ii) comparing the average precision of localizing attributes to the scene parts.

Related Material


[pdf]
[bibtex]
@InProceedings{Wang_2013_CVPR,
author = {Wang, Shuo and Joo, Jungseock and Wang, Yizhou and Zhu, Song-Chun},
title = {Weakly Supervised Learning for Attribute Localization in Outdoor Scenes},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2013}
}