A Critical Lens on Dementia and Design in HCI

Caroline Edasis
Caroline Edasis

CHI, pp. 2175-2188, 2017.

Cited by: 45|Bibtex|Views33|Links
EI
Keywords:
Deep Deterministic Policy Gradientscontinuous state spacerobocup championneural networkvalue functionMore(9+)
Weibo:
This paper has presented an agent trained exclusively with deep reinforcement learning which learns from scratch how to approach the ball, kick the ball to goal, and score

Abstract:

Designing new technologies with and for individuals with dementia is a growing topic of interest within HCI. Yet, predominant societal views contribute to the positioning of individuals with dementia as deficient and declining, and treat technology as filling a gap left by impairment. We present the perspective of critical dementia as a w...More

Code:

Data:

0
Introduction
  • This paper extends the Deep Deterministic Policy Gradients (DDPG) algorithm (Lillicrap et al, 2015) into a parameterized action space.
  • Narasimhan et al (2015) used decaying traces, Hausknecht & Stone (2015) investigated LSTM recurrency, and van Hasselt et al (2015) explored double Q-Learning
  • These networks work well in continuous state spaces but do not function in continuous action spaces because the output nodes of the network, while continuous, are trained to output Q-Value estimates rather than continuous actions
Highlights
  • This paper extends the Deep Deterministic Policy Gradients (DDPG) algorithm (Lillicrap et al, 2015) into a parameterized action space
  • Having introduced the background of deep reinforcement learning in continuous action space, we present the parameterized action space
  • This paper has presented an agent trained exclusively with deep reinforcement learning which learns from scratch how to approach the ball, kick the ball to goal, and score
  • More generally we have demonstrated the capability of deep reinforcement learning in parameterized action space
  • We extended the Deep Deterministic Policy Gradients algorithm (Lillicrap et al, 2015), by presenting an analyzing a novel approach for bounding the action space gradients suggested by the Critic
Results
  • The authors evaluate the zeroing, squashing, and inverting gradient approaches in the parameterized HFO domain on the task of approaching the ball and scoring a goal.
  • Only the inverting gradient shows robust learning.
  • Both inverting gradient agents learned to reliably approach the ball and score goals.
  • None of the other four agents using the squashing or zeroing gradients were able to reliably approach the ball or score
Conclusion
  • This paper has presented an agent trained exclusively with deep reinforcement learning which learns from scratch how to approach the ball, kick the ball to goal, and score.
  • More generally the authors have demonstrated the capability of deep reinforcement learning in parameterized action space
  • To make this possible, the authors extended the DDPG algorithm (Lillicrap et al, 2015), by presenting an analyzing a novel approach for bounding the action space gradients suggested by the Critic.
  • The authors extended the DDPG algorithm (Lillicrap et al, 2015), by presenting an analyzing a novel approach for bounding the action space gradients suggested by the Critic
  • This extension is not specific to the HFO domain and will likely prove useful for any continuous, bounded action space
Summary
  • Introduction:

    This paper extends the Deep Deterministic Policy Gradients (DDPG) algorithm (Lillicrap et al, 2015) into a parameterized action space.
  • Narasimhan et al (2015) used decaying traces, Hausknecht & Stone (2015) investigated LSTM recurrency, and van Hasselt et al (2015) explored double Q-Learning
  • These networks work well in continuous state spaces but do not function in continuous action spaces because the output nodes of the network, while continuous, are trained to output Q-Value estimates rather than continuous actions
  • Results:

    The authors evaluate the zeroing, squashing, and inverting gradient approaches in the parameterized HFO domain on the task of approaching the ball and scoring a goal.
  • Only the inverting gradient shows robust learning.
  • Both inverting gradient agents learned to reliably approach the ball and score goals.
  • None of the other four agents using the squashing or zeroing gradients were able to reliably approach the ball or score
  • Conclusion:

    This paper has presented an agent trained exclusively with deep reinforcement learning which learns from scratch how to approach the ball, kick the ball to goal, and score.
  • More generally the authors have demonstrated the capability of deep reinforcement learning in parameterized action space
  • To make this possible, the authors extended the DDPG algorithm (Lillicrap et al, 2015), by presenting an analyzing a novel approach for bounding the action space gradients suggested by the Critic.
  • The authors extended the DDPG algorithm (Lillicrap et al, 2015), by presenting an analyzing a novel approach for bounding the action space gradients suggested by the Critic
  • This extension is not specific to the HFO domain and will likely prove useful for any continuous, bounded action space
Related work
  • RoboCup 2D soccer has a rich history of learning. In one of the earliest examples, Andre & Teller (1999) used Genetic Programming to evolve policies for RoboCup 2D Soccer. By using a sequence of reward functions, they first encourage the players to approach the ball, kick the ball, score a goal, and finally to win the game. Similarly, our work features players whose policies are entirely trained and have no hand-coded components. Our work differs by using a gradient-based learning method paired with using reinforcement learning rather than evolution.

    Masson & Konidaris (2015) present a parameterized-action MDP formulation and approaches for model-free reinforcement learning in such environments. The core of this approach uses a parameterized policy for choosing which discrete action to select and another policy for selecting continuous parameters for that action. Given a fixed policy for parameter selection, they use Q-Learning to optimize the policy discrete action selection. Next, they fix the policy for discrete action selection and use a policy search method to optimize the parameter selection. Alternating these two learning phases yields convergence to either a local or global optimum depending on whether the policy search procedure can guarantee optimality. In contrast, our approach to learning in parameterized action space features a parameterized actor that learns both discrete actions and parameters and a parameterized critic that learns only the action-value function. Instead of relying on an external policy search procedure, we are able to directly query the critic for gradients. Finally, we parameterize our policies using deep neural networks rather than linear function approximation. Deep networks offer no theoretical convergence guarantees, but have a strong record of empirical success.
Funding
  • LARG research is supported in part by grants from the National Science Foundation (CNS-1330072, CNS1305287), ONR (21C184-01), AFRL (FA8750-14-1-0070), AFOSR (FA9550-14-1-0087), and Yujin Robot
Reference
  • Akiyama, Hidehisa. Agent2d base code, 2010.
    Google ScholarFindings
  • Andre, David and Teller, Astro. Evolving Team Darwin United. Lecture Notes in Computer Science, 1604:346, 1999. ISSN 0302-9743. URL http://link.springer-ny.com/link/service/series/0558/bibs/1604/16040346.htm;http://link.springer-ny.com/link/service/series/0558/papers/1604/16040346.pdf.
    Locate open access versionFindings
  • Hafner, Roland and Riedmiller, Martin. Reinforcement learning in feedback control. Machine Learning, 84(1-2):137–169, 2011. ISSN 0885-6125. doi: 10.1007/s10994-011-5235-x. URL http://dx.doi.org/10.1007/s10994-011-5235-x.
    Locate open access versionFindings
  • Hausknecht, Matthew J. and Stone, Peter. Deep recurrent q-learning for partially observable mdps. CoRR, abs/1507.06527, 2015. URL http://arxiv.org/abs/1507.06527.
    Findings
  • Kalyanakrishnan, Shivaram, Liu, Yaxin, and Stone, Peter. Half field offense in RoboCup soccer: A multiagent reinforcement learning case study. In Lakemeyer, Gerhard, Sklar, Elizabeth, Sorenti, Domenico, and Takahashi, Tomoichi (eds.), RoboCup-2006: Robot Soccer World Cup X, volume 4434 of Lecture Notes in Artificial Intelligence, pp. 72–8Springer Verlag, Berlin, 2007. ISBN 978-3-540-74023-0.
    Google ScholarLocate open access versionFindings
  • Kingma, Diederik P. and Ba, Jimmy. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. URL http://arxiv.org/abs/1412.6980.
    Findings
  • Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. CoRR, abs/1504.00702, 2015. URL http://arxiv.org/abs/1504.00702.
    Findings
  • Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. ArXiv e-prints, September 2015.
    Google ScholarFindings
  • MacAlpine, Patrick, Depinet, Mike, and Stone, Peter. UT Austin Villa 2014: RoboCup 3D simulation league champion via overlapping layered learning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), January 2015.
    Google ScholarLocate open access versionFindings
  • Masson, Warwick and Konidaris, George. Reinforcement learning with parameterized actions. CoRR, abs/1509.01644, 2015. URL http://arxiv.org/abs/1509.01644.
    Findings
  • Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A., Veness, Joel, Bellemare, Marc G., Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K., Ostrovski, Georg, Petersen, Stig, Beattie, Charles, Sadik, Amir, Antonoglou, Ioannis, King, Helen, Kumaran, Dharshan, Wierstra, Daan, Legg, Shane, and Hassabis, Demis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. ISSN 0028-0836. doi: 10.1038/nature14236. URL http://dx.doi.org/10.1038/nature14236.
    Locate open access versionFindings
  • Narasimhan, Karthik, Kulkarni, Tejas, and Barzilay, Regina. Language understanding for textbased games using deep reinforcement learning. CoRR, abs/1506.08941, 2015. URL http://arxiv.org/abs/1506.08941.
    Findings
  • Oh, Junhyuk, Guo, Xiaoxiao, Lee, Honglak, Lewis, Richard L., and Singh, Satinder P. Actionconditional video prediction using deep networks in atari games. CoRR, abs/1507.08750, 2015. URL http://arxiv.org/abs/1507.08750.
    Findings
  • Riedmiller, Martin, Gabel, Thomas, Hafner, Roland, and Lange, Sascha. Reinforcement learning for robot soccer. Autonomous Robots, 27(1):55–73, 2009. ISSN 0929-5593. doi: 10.1007/ s10514-009-9120-4. URL http://dx.doi.org/10.1007/s10514-009-9120-4.
    Locate open access versionFindings
  • Riedmiller, Martin A. and Gabel, Thomas. On experiences in a complex and competitive gaming domain: Reinforcement learning meets robocup. In CIG, pp. 17–23. IEEE, 2007. ISBN 14244-0709-5. URL http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=4219012.
    Locate open access versionFindings
  • Stadie, Bradly C., Levine, Sergey, and Abbeel, Pieter. Incentivizing exploration in reinforcement learning with deep predictive models. CoRR, abs/1507.00814, 2015. URL http://arxiv.org/abs/1507.00814.
    Findings
  • Sutton, Richard S. and Barto, Andrew G. Reinforcement Learning: An Introduction. MIT Press, 1998. ISBN 0262193981. URL http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html.
    Findings
  • Tieleman, T. and Hinton, G. Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 2012.
    Google ScholarLocate open access versionFindings
  • van Hasselt, Hado, Guez, Arthur, and Silver, David. Deep reinforcement learning with double qlearning. CoRR, abs/1509.06461, 2015. URL http://arxiv.org/abs/1509.06461.
    Findings
  • Watkins, Christopher J. C. H. and Dayan, Peter. Q-learning. Machine Learning, 8(3-4):279–292, 1992. doi: 10.1023/A:1022676722315. URL http://jmvidal.cse.sc.edu/library/watkins92a.pdf.
    Locate open access versionFindings
  • Zeiler, Matthew D. ADADELTA: An adaptive learning rate method. CoRR, abs/1212.5701, 2012. URL http://arxiv.org/abs/1212.5701.
    Findings
Your rating :
0

 

Best Paper
Best Paper of CHI, 2017
Tags
Comments