Long-Term Activity Forecasting Using First-Person Vision

Syed Zahir Bokhari,Kris M. Kitani

COMPUTER VISION - ACCV 2016, PT V(2016)

引用 23|浏览6
暂无评分
摘要
Long-term activity forecasting deals with the problem of predicting how an agent will complete a full activity, defined as a continuous trajectory and a discrete sequence of sub-actions. While previous data-driven methods only dealt with forecasting 2D trajectories, we present a method that leverages common sense prior knowledge and minimal data. In order to forecast the trajectories, we learn a policy function that maps from states to actions the agent should perform next. Through the use of deep reinforcement learning, our method is able to learn a highly non-linear mapping from agent states to actions. We develop the first forecasting framework that uses ego-centric video input, which is an optimal vantage point for understanding human activities over large spaces. Given an annotated first person video sequence for the activity, we construct a 3D point cloud of the environment and activity paths through 3D space. Based on a limited number of examples, we use reinforcement learning to derive a policy for the entire environment, even for areas that have never been visited during the demonstrated examples. We explore the use of deep reinforcement learning to recover a direct mapping from environmental features to best action. Our approach makes it possible to combine a high dimensional continuous state (namely the local point could density surrounding the agent) with a discrete state portion (action stage of an activity) into a single state for behavior forecasting. The result is a policy that generalizes very well from only a few activity samples. We validate our approach on our First-Person Office Behavior Dataset and show that our method of encoding more prior knowledge leads to an increase in forecasting accuracy. We also demonstrate that the deep reinforcement learning approach is able to achieve higher forecasting accuracy than the traditional alternatives.
更多
查看译文
关键词
Activity Forecasting, First-person Video, Deep Reinforcement Learning, Egocentric Video, High-dimensional Continuous State
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要