ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Observation-Aligned Action Spaces
arxiv(2024)
摘要
Vision-based robotic cloth unfolding has made great progress recently.
However, prior works predominantly rely on value learning and have not fully
explored policy-based techniques. Recently, the success of reinforcement
learning on the large language model has shown that the policy gradient
algorithm can enhance policy with huge action space. In this paper, we
introduce ClothPPO, a framework that employs a policy gradient algorithm based
on actor-critic architecture to enhance a pre-trained model with huge 10^6
action spaces aligned with observation in the task of unfolding clothes. To
this end, we redefine the cloth manipulation problem as a partially observable
Markov decision process. A supervised pre-training stage is employed to train a
baseline model of our policy. In the second stage, the Proximal Policy
Optimization (PPO) is utilized to guide the supervised model within the
observation-aligned action space. By optimizing and updating the strategy, our
proposed method increases the garment's surface area for cloth unfolding under
the soft-body manipulation task. Experimental results show that our proposed
framework can further improve the unfolding performance of other
state-of-the-art methods.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要