Primate Motor Cortical Activity Displays Hallmarks of a Temporal Difference Reinforcement Learning Process

2023 11th International IEEE/EMBS Conference on Neural Engineering (NER)(2023)

引用 0|浏览4
暂无评分
摘要
Reinforcement learning (RL) models comprehensively describe neural dynamics of multiple brain regions at several spatiotemporal scales during reinforcement-based learning. One of the key components of RL models that capture the expected cumulative reward from a given state is the State-Value Function (SVF). We utilized a non-human primate (NHP) subject (Bonnet macaque) that was implanted with a 96-electrode array in the primary motor cortex. The NHP performed a reward level cued reaching task manually and passively observed such a task. Here we show that primary motor cortical (M1) activity in an NHP resembles an RL process, encoding a state value function. The motor cortex responds to reward delivery (US, unconditioned stimulus) and extends this state-value-related response earlier in a trial, becoming predictive of the expected reward when indicated by an explicit cue (CS, conditioned stimulus). This SVF is observed in tasks performed both manually and passively, that is, without agency. Here, we used the Microstimulus Temporal Difference RL (MSTD) model, reported to accurately capture RL-related dopaminergic activity, to parsimoniously account for both the phasic and tonic M1 reward-related neural activity. In the future, we will use this state value information towards autonomously updating brain-machine interfaces (BMIs) to maximize the total subjective reward expectation of the NHP user.
更多
查看译文
关键词
reinforcement learning, motor cortex, BMI, reward, mirror neuron
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要