His main research thrust is high-level computer vision and its relationship to human language, robotics and data science. He primarily focuses on problems in video understanding such as video segmentation, activity recognition, and video-to-text. From biomedicine to recreational video, imaging data is ubiquitous. Yet, imaging scientists and intelligence analysts are without an adequate language and set of tools to fully tap the information-rich image and video. He works to provide such a language; specifically, he primarily studies the coupled problems of segmentation and recognition from a Bayesian perspective emphasizing the role of statistical models in efficient visual inference. His long-term goal is a comprehensive and robust methodology of automatically mining, quantifying, and generalizing information in large sets of projective and volumetric images and video.