Chain-of-Thought Predictive Control
arxiv(2023)
Abstract
We study generalizable policy learning from demonstrations for complex
low-level control (e.g., contact-rich object manipulations). We propose a novel
hierarchical imitation learning method that utilizes sub-optimal demos.
Firstly, we propose an observation space-agnostic approach that efficiently
discovers the multi-step subskill decomposition of the demos in an unsupervised
manner. By grouping temporarily close and functionally similar actions into
subskill-level demo segments, the observations at the segment boundaries
constitute a chain of planning steps for the task, which we refer to as the
chain-of-thought (CoT). Next, we propose a Transformer-based design that
effectively learns to predict the CoT as the subskill-level guidance. We couple
action and subskill predictions via learnable prompt tokens and a hybrid
masking strategy, which enable dynamically updated guidance at test time and
improve feature representation of the trajectory for generalizable policy
learning. Our method, Chain-of-Thought Predictive Control (CoTPC), consistently
surpasses existing strong baselines on challenging manipulation tasks with
sub-optimal demos.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined