Latent Trees for Estimating Intensity of Facial Action Units

Sebastian Kaltwang, Sinisa Todorovic, Maja Pantic; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 296-304


This paper is about estimating intensity levels of Facial Action Units (FAUs) in videos as an important and challenging step toward interpreting facial expressions. To address uncertainty in detections of facial landmark points, used as out input features, we formulate a new generative framework comprised of a graphical model, inference, and algorithms for learning both model parameters and structure. Our model is a latent tree (LT) that represents input features of facial landmark points and FAU intensities as leaf nodes, and encodes their higher-order dependencies with latent nodes at tree levels closer to the root. No other restrictions are placed on the model structure beyond that it is a tree. We specify a new algorithm for efficient learning of model structure that iteratively builds LT by adding either new edge or new hidden node to LT, whichever of these two graph-edit operations gives the highest increase of the joint likelihood. Our structure learning efficiently computes likelihood increase and selects an optimal graph revision without considering all possible structural changes. For FAU intensity estimation, we derive closed-form expressions of posterior marginals of all variables in LT, and specify an efficient inference of in two passes -- bottom-up and top-down. Our evaluation on the benchmark DISFA and ShoulderPain datasets, in subject-independent setting, demonstrate that we outperform the state of the art, even in the presence of significant noise in locations of facial landmark points. We demonstrate our correct learning of model structure by probabilistically sampling facial landmark points, conditioned on a given FAU intensity, and thus generating plausible facial expressions.

Related Material

author = {Kaltwang, Sebastian and Todorovic, Sinisa and Pantic, Maja},
title = {Latent Trees for Estimating Intensity of Facial Action Units},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2015}