# Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

Hangyu Mao
Jun Luo
Dong Li
Jun Wang

national conference on artificial intelligence, 2020.

Keywords:
Received Signal Strength Indicationmulti agent reinforcement learningDeep DPGmarl methodneighborhood cognition consistentMore(14+)
Weibo:
We propose neighborhood cognition consistent deep Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations

Abstract:

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation. Meanwhile, only cognitive consistency within a neighborhood matters because humans only in...More

Code:

Data:

0
Introduction
Highlights
• Cognitive consistency theories show that people usually seek to perceive the environment in a simple and consistent way (Simon, Snow, and Read 2004; Russo et al 2008; Lakkaraju and Speed 2019)
• Inspired by the above observations, we introduce neighborhood cognitive consistency into multi-agent reinforcement learning (MARL) to facilitate agent cooperations
• Compared to recent MARL methods that focus on designing global coordination mechanisms like centralized critic (Lowe et al 2017; Foerster et al 2018; Mao et al 2019), joint value factorization (Sunehag et al 2018; Rashid et al 2018; Son et al 2019) and agent communication (Foerster et al 2016; Sukhbaatar, Fergus, and others 2016; Peng et al 2017), the neighborhood cognitive consistency takes an alternative but complementary approach of focusing on innovating value network design from a local perspective
• Independent DQN (IDQN) is always the worst one because it does not have any explicit coordination mechanism, which in turn shows the necessity of coordination mechanisms in large-scale settings
• Considering all experiments, the most important lesson learned here is that there is usually a close relationship between agent cooperation and agent cognitive consistency: if the agents have formed more similar and consistent cognition values, they are more likely to achieve better cooperations. Inspired by both social psychology and real experiences, this paper introduces two novel neighborhood cognition consistent reinforcement learning methods, neighborhood cognitive consistency} (NCC)-Q and NCC-AC, to facilitate large-scale agent cooperations
Methods
• The authors represent a multi-agent environment as a graph G, where the nodes of G stand for the agents, and the link between two nodes represents a relationship, e.g., a communication channel exists, between corresponding agents.
• In order to study the neighborhood cognitive consistency, the authors define cognition of an agent as its understanding of the local environment.
• It includes the observations of all agents in its neighborhood, as well as the high-level knowledge extracted from these observations.
• The authors define neighborhood cognitive consistency as that the neighboring agents have formed similar cognitions about their neighborhood.
• The resulting new methods are named as NCC-Q and NCC-AC, respectively
Results
• All methods achieve good performance
• It means that the routing environments are suitable for testing RL methods.
• IDQN VDN QMIX DGN NCC-Q 1 0.8 0.6 0.4 0.2 small topology middle topology large topology (a) Wifi configuration tasks.
• The comparison of VDN and QMIX shows that VDN outperforms QMIX in simple scenarios, while it underperforms QMIX in complex scenarios
• This highlights the relationship among method performance, method complexity and task complexity.
• IDQN is always the worst one because it does not have any explicit coordination mechanism, which in turn shows the necessity of coordination mechanisms in large-scale settings
Conclusion
• Inspired by both social psychology and real experiences, this paper introduces two novel neighborhood cognition consistent reinforcement learning methods, NCC-Q and NCC-AC, to facilitate large-scale agent cooperations.
• All neighboring agents will eventually form consistent neighborhood cognitions and achieve good cooperations.
• The authors evaluate the methods on three tasks developed based on eight real-world scenarios.
• Extensive results show that they outperform the state-of-the-art methods by a clear margin, and achieve good scalability in routing tasks.
• Ablation studies and further analyses are provided for better understanding of the methods
Summary
• ## Introduction:

Cognitive consistency theories show that people usually seek to perceive the environment in a simple and consistent way (Simon, Snow, and Read 2004; Russo et al 2008; Lakkaraju and Speed 2019).
• If the perceptions are inconsistent, people produce an uncomfortable feeling, and they will change behaviors to reduce this feeling by making their cognitions consistent
• This applies to multi-agent systems (OroojlooyJadid and Hajinezhad 2019; Bear, Kagan, and Rand 2017; Corgnet, Espın, and Hernan-Gonzalez 2015): agents that maintain consistent cognitions about their environments are crucial for achieving effective system-level cooperation.
• Inspired by the above observations, the authors introduce neighborhood cognitive consistency into multi-agent reinforcement learning (MARL) to facilitate agent cooperations.
• ## Methods:

The authors represent a multi-agent environment as a graph G, where the nodes of G stand for the agents, and the link between two nodes represents a relationship, e.g., a communication channel exists, between corresponding agents.
• In order to study the neighborhood cognitive consistency, the authors define cognition of an agent as its understanding of the local environment.
• It includes the observations of all agents in its neighborhood, as well as the high-level knowledge extracted from these observations.
• The authors define neighborhood cognitive consistency as that the neighboring agents have formed similar cognitions about their neighborhood.
• The resulting new methods are named as NCC-Q and NCC-AC, respectively
• ## Results:

All methods achieve good performance
• It means that the routing environments are suitable for testing RL methods.
• IDQN VDN QMIX DGN NCC-Q 1 0.8 0.6 0.4 0.2 small topology middle topology large topology (a) Wifi configuration tasks.
• The comparison of VDN and QMIX shows that VDN outperforms QMIX in simple scenarios, while it underperforms QMIX in complex scenarios
• This highlights the relationship among method performance, method complexity and task complexity.
• IDQN is always the worst one because it does not have any explicit coordination mechanism, which in turn shows the necessity of coordination mechanisms in large-scale settings
• ## Conclusion:

Inspired by both social psychology and real experiences, this paper introduces two novel neighborhood cognition consistent reinforcement learning methods, NCC-Q and NCC-AC, to facilitate large-scale agent cooperations.
• All neighboring agents will eventually form consistent neighborhood cognitions and achieve good cooperations.
• The authors evaluate the methods on three tasks developed based on eight real-world scenarios.
• Extensive results show that they outperform the state-of-the-art methods by a clear margin, and achieve good scalability in routing tasks.
• Ablation studies and further analyses are provided for better understanding of the methods
Funding
• This work was supported by the National Natural Science Foundation of China under Grant No.61872397
Reference
• [Bear, Kagan, and Rand 2017] Bear, A.; Kagan, A.; and Rand, D. G. 2017. Co-evolution of cooperation and cognition: the impact of imperfect deliberation and contextsensitive intuition. Proceedings of the Royal Society B: Biological Sciences 284(1851):20162326.
• [Bernstein et al. 2002] Bernstein, D. S.; Givan, R.; Immerman, N.; and Zilberstein, S. 200The complexity of decentralized control of markov decision processes. Mathematics of operations research 27(4):819–840.
• [Blei, Kucukelbir, and McAuliffe 2017] Blei, D. M.; Kucukelbir, A.; and McAuliffe, J. D. 2017. Variational inference: A review for statisticians. Journal of the American Statistical Association 112(518):859–877.
• [Corgnet, Espın, and Hernan-Gonzalez 2015] Corgnet, B.; Espın, A. M.; and Hernan-Gonzalez, R. 2015. The cognitive basis of social behavior: cognitive reflection overrides antisocial but not always prosocial motives. Frontiers in behavioral neuroscience 9:287.
• [Duan et al. 2016] Duan, Y.; Chen, X.; Houthooft, R.; Schulman, J.; and Abbeel, P. 2016. Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning, 1329–1338.
• [Foerster et al. 2016] Foerster, J.; Assael, I. A.; de Freitas, N.; and Whiteson, S. 201Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, 2137–2145.
• [Foerster et al. 2018] Foerster, J. N.; Farquhar, G.; Afouras, T.; Nardelli, N.; and Whiteson, S. 2018. Counterfactual multi-agent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence.
• [Jiang, Dun, and Lu 2018] Jiang, J.; Dun, C.; and Lu, Z. 201Graph convolutional reinforcement learning for multiagent cooperation. arXiv preprint arXiv:1810.09202.
• [Kurach et al. 2019] Kurach, K.; Raichuk, A.; Stanczyk, P.; Zajac, M.; Bachem, O.; Espeholt, L.; Riquelme, C.; Vincent, D.; Michalski, M.; Bousquet, O.; et al. 201Google research football: A novel reinforcement learning environment. arXiv preprint arXiv:1907.11180.
• [Lakkaraju and Speed 2019] Lakkaraju, K., and Speed, A. 2019. A cognitive-consistency based model of population wide attitude change. In Complex Adaptive Systems. Springer. 17–38.
• [Lowe et al. 2017] Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, O. P.; and Mordatch, I. 2017. Multi-agent actorcritic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, 6379– 6390.
• [Mao et al. 2019] Mao, H.; Zhang, Z.; Xiao, Z.; and Gong, Z. 2019. Modelling the dynamic joint policy of teammates with attention multi-agent DDPG. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 1108–1116. International Foundation for Autonomous Agents and Multiagent Systems.
• [Mnih et al. 2015] Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Human-level control through deep reinforcement learning. Nature 518(7540):529.
• [OroojlooyJadid and Hajinezhad 2019] OroojlooyJadid, A., and Hajinezhad, D. 2019. A review of cooperative multi-agent deep reinforcement learning. arXiv preprint arXiv:1908.03963.
• [Peng et al. 2017] Peng, P.; Yuan, Q.; Wen, Y.; Yang, Y.; Tang, Z.; Long, H.; and Wang, J. 2017. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint arXiv:1703.10069.
• [Rashid et al. 2018] Rashid, T.; Samvelyan, M.; Witt, C. S.; Farquhar, G.; Foerster, J.; and Whiteson, S. 2018. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, 4292–4301.
• [Russo et al. 2008] Russo, J. E.; Carlson, K. A.; Meloy, M. G.; and Yong, K. 2008. The goal of consistency as a cause of information distortion. Journal of Experimental Psychology: General 137(3):456.
• [Silver et al. 2014] Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; and Riedmiller, M. 2014. Deterministic policy gradient algorithms. In International Conference on Machine Learning, 387–395.
• [Simon, Snow, and Read 2004] Simon, D.; Snow, C. J.; and Read, S. J. 2004. The redux of cognitive consistency theories: evidence judgments by constraint satisfaction. Journal of personality and social psychology 86(6):814.
• [Son et al. 2019] Son, K.; Kim, D.; Kang, W. J.; Hostallero, D. E.; and Yi, Y. 2019. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning, 5887–5896.
• [Sukhbaatar, Fergus, and others 2016] Sukhbaatar, S.; Fergus, R.; et al. 2016. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, 2244–2252.
• [Sunehag et al. 2018] Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W. M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J. Z.; Tuyls, K.; et al. 2018. Valuedecomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2085–2087. International Foundation for Autonomous Agents and Multiagent Systems.
• [Sutton and Barto 1998] Sutton, R. S., and Barto, A. G. 1998. Introduction to reinforcement learning, volume 2. MIT press Cambridge.
• [Tampuu et al. 2017] Tampuu, A.; Matiisen, T.; Kodelja, D.; Kuzovkin, I.; Korjus, K.; Aru, J.; Aru, J.; and Vicente, R. 2017. Multiagent cooperation and competition with deep reinforcement learning. PloS one 12(4):e0172395.
• [Tan 1993] Tan, M. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, 330–337.