Reinforcement learning with smart contracts on blockchains

Future Generation Computer Systems(2023)

引用 0|浏览11
In recent years Machine Learning and Blockchain technologies have been at the spearhead of innovation, both in the research and application fields. Machine Learning is predominantly used to enable data knowledge extraction while Blockchain excels in providing a ‘public ledger’ upon which data are securely, consistently and irreversibly recorded. Machine Learning may use data stored on Blockchains and pursue to exploit distributed computing resources. On the other hand, Blockchain may exploit Machine Learning and capitalize user data and establish marketplaces for Machine Learning models. In this work we propose a combination of Machine Learning and in particular Reinforcement Learning (RL) and Imitation Learning (IL) with Blockchain. RL allows a software agent to interact with its environment and learn – via ‘trial and error’ techniques – based exclusively on its own activity, experiences and observations. The software agent will learn via an interactions’ reward/ penalize set of measures, immediately received from its own environment. Designing an interactions’ reward/penalize mechanism is challenging as designers need to draw focused techniques securing that agents’ immediate environment will consistently recognize and reward desirable agent behaviour and that the rewarding mechanism cannot be tapped, corrupted or circumvented. In this work, we have approached this via a coordinated collaboration of RL and IL. A Trainer Agent takes on the task of training Trainee agents using RL/IL via recording its own environmental behaviour in demonstration files. In this respect trainees may imitate trainers’ good practices and get effectively trained. This work proposes the concept of an expert trainer software agent (the Trainer Agent) who records its own behaviour in demonstration files and distributes these files via Blockchain to other (receiving) software agents (Trainee agents). Trainees’ training is applied using RL techniques (i.e. reward/ penalize) in conjunction with IL (based on demo files). Demo files are ‘stored’ on smart contract Blockchains, who in the end get to reward Trainer Agents; pro-rated according to the level with which the Trainer has assisted to the improvement of the Trainee agent models. The invariant Blockchain structure with its unmodifiable smart contracts’ nature secure the demo files and nurture credible all interactions among stakeholders involved. The developed application (dApp) fully automates the workflow of trading demonstration files and of training the Trainee agents.
smart contracts,blockchains,reinforcement,learning
AI 理解论文