Efficient Label Collection for Unlabeled Image Datasets

Maggie Wigness, Bruce A. Draper, J. Ross Beveridge; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4594-4602


Visual classifiers are part of many applications including surveillance, autonomous navigation and scene understanding. The raw data used to train these classifiers is abundant and easy to collect but lacks labels. Labels are necessary for training supervised classifiers, but the labeling process requires significant human effort. Techniques like active learning and group-based labeling have emerged to help reduce the labeling workload. However, the possibility of collecting label noise affects either the efficiency of these systems or the performance of the trained classifiers. Further, many introduce latency by iteratively re-training classifiers or re-clustering data. We introduce a technique that searches for structural change in hierarchically clustered data to identify a set of clusters that span a spectrum of visual concept granularities. This allows us to efficiently label clusters with less label noise and produce high performing classifiers. The data is hierarchically clustered only once, eliminating latency during the labeling process. Using benchmark data we show that collecting labels with our approach is more efficient than existing labeling techniques, and achieves higher classification accuracy. Finally, we demonstrate the speed and efficiency of our system using real-world data collected for an autonomous navigation task.

Related Material

author = {Wigness, Maggie and Draper, Bruce A. and Ross Beveridge, J.},
title = {Efficient Label Collection for Unlabeled Image Datasets},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2015}