Efficient Incremental Offline Reinforcement Learning With Sparse Broad Critic Approximation

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS(2024)

引用 0|浏览11
暂无评分
摘要
Offline reinforcement learning (ORL) has been getting increasing attention in robot learning, benefiting from its ability to avoid hazardous exploration and learn policies directly from precollected samples. Approximate policy iteration (API) is one of the most commonly investigated ORL approaches in robotics, due to its linear representation of policies, which makes it fairly transparent in both theoretical and engineering analysis. One open problem of API is how to design efficient and effective basis functions. The broad learning system (BLS) has been extensively studied in supervised and unsupervised learning in various applications. However, few investigations have been conducted on ORL. In this article, a novel incremental ORL approach with sparse broad critic approximation (BORL) is proposed with the advantages of BLS, which approximates the critic function in a linear manner with randomly projected sparse and compact features and dynamically expands its broad structure. The BORL is the first extension of API with BLS in the field of robotics and ORL. The approximation ability and convergence performance of BORL are also analyzed. Comprehensive simulation studies are then conducted on two benchmarks, and the results demonstrate that the proposed BORL can obtain comparable or better performance than conventional API methods without laborious hyperparameter fine-tuning work. To further demonstrate the effectiveness of BORL in practical robotic applications, a variable force tracking problem in robotic ultrasound scanning (RUSS) is investigated, and a learning-based adaptive impedance control (LAIC) algorithm is proposed based on BORL. The experimental results demonstrate the advantages of LAIC compared with conventional force tracking methods.
更多
查看译文
关键词
Broad learning system (BLS),incremental critic design,linear critic approximation (LCA),offline reinforcement learning (ORL),variable force tracking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要