Robust Auto-parking: Reinforcement Learning based Real-time Planning Approach with Domain Template

Yuzheng Zhuang
Yuzheng Zhuang
Qiang Gu
Qiang Gu
Bin Wang
Bin Wang
Jun Luo
Jun Luo
Hongbo Zhang
Hongbo Zhang

NeurIPS Workshop on MLITS, 2018.

Cited by: 2|Bibtex|Views11|Links
Keywords:
automatic parkingMulti-layer Perceptronparking problemDeep Q-learning from Demonstrationtemporal informationMore(18+)
Weibo:
This paper presents an automatic parking for a passenger vehicle, with highlights on a robust real-time planning approach and on experimental results

Abstract:

This paper presents an automatic parking for a passenger vehicle, with highlights on a robust real-time planning approach and on experimental results. We propose a framework that leverages the strength of learning-based approaches for robustness to environments noise and capability of dealing with challenging tasks, and rule-based approac...More

Code:

Data:

0
Introduction
  • Introduction & Related Works

    13 Automatic parking is an autonomous car-maneuvering system that moves a vehicle from a traffic lane into a parking spot to perform parallel, perpendicular or angle parking.
  • With the recent success of deep RL, in this work the authors propose a framework that leverages the strength of learning-based approaches for robustness to environments noise and capability of dealing with challenging tasks, and rule-based approaches for its versatility of handling normal tasks, by integrating simple rules with RL under a multi-stage architecture, which is inspired by typical auto-parking template that is universally used in geometric planning approach.
  • The authors present preliminary benchmarks and show the approach can outperform a geometric planning baseline in the robustness to environment noise and efficiency of planning in a high-fidelity simulator
Highlights
  • Introduction & Related Works

    13 Automatic parking is an autonomous car-maneuvering system that moves a vehicle from a traffic lane into a parking spot to perform parallel, perpendicular or angle parking
  • With the recent success of deep Reinforcement learning (RL), in this work we propose a framework that leverages the strength of learning-based approaches for robustness to environments noise and capability of dealing with challenging tasks, and rule-based approaches for its versatility of handling normal tasks, by integrating simple rules with RL under a multi-stage architecture, which is inspired by typical auto-parking template that is universally used in geometric planning approach
  • 144 To evaluate the performance of our approach, we compare it against two baselines that based on geometric planning and Deep Q-learning from Demonstration (DQFD), and a Multi-layer Perceptron (MLP) policy learned under our approach framework
  • 149 Robustness to Environment Noise: We explored the robustness of the learned policy with our approach to the changes in the environment noise, which infer the noise in the initial state and the detection of the target parking spot
  • We proposed a framework that leverages the strength of learning-based approaches for robustness to environments noise and capability of dealing with challenging task, and rule-based approaches for its versatility of handling normal tasks, by integrating simple rules with RL under a multi-stage architecture, which inspired by the domain template
  • Our work represent that RL could solving parking problem under static environment efficiently while showing the ability for generalization
Methods
  • 114 state under certain constraints from ego-vehicle, parking space, and static obstacles distribution.
  • 115 task by a tuple:.
  • Given the state st ∈ PT (s) that perceived at time t for task T , the agent predicts a distribution of actions.
  • The agent interacts with the environment.
  • 119 and perceives state st+1 ∼ P and the immediate reward Rt according to the reward function.
  • PT (s) and PT define the environment in task T.
Results
  • 144 To evaluate the performance of the approach, the authors compare it against two baselines that based on geometric planning and DQFD, and a MLP policy learned under the approach framework.
  • All experiments are done in a high-fidelity simulator
  • For such benchmarks the authors set the scenario-related parameters as follows: parking spot size equals to 6.0 × 2.5m with aisle width is larger than 6m, ego-vehicle size is 4.93 × 1.87m, and the distance from ego-vehicle’s back axle center to tail-stock is 1.04m.
  • The geometric planning baseline covers a smaller vertical distance range, which is 0.5m less than the policy (LSTM) learned by the approach.
  • The ranges of the initial states that the policies learned with DQFD and the approach as represented by MLP could cover are quite limited.
  • The policy (LSTM) learned by the approach is the most robust to the noise in the initial state
Conclusion
  • The authors proposed a framework that leverages the strength of learning-based approaches for robustness to environments noise and capability of dealing with challenging task, and rule-based approaches for its versatility of handling normal tasks, by integrating simple rules with RL under a multi-stage architecture, which inspired by the domain template
  • Such a framework provides a mechanism for them to incorporate prior knowledge for decomposing a task into a multi-stage problem with shorter-term rewards, which can lead to fast convergence to successful parking policies.
  • The authors foresee using model-based RL with optimization-based planning approach and model-free RL to accomplish parking problem under more complex and dynamic scenarios
Summary
  • Introduction:

    Introduction & Related Works

    13 Automatic parking is an autonomous car-maneuvering system that moves a vehicle from a traffic lane into a parking spot to perform parallel, perpendicular or angle parking.
  • With the recent success of deep RL, in this work the authors propose a framework that leverages the strength of learning-based approaches for robustness to environments noise and capability of dealing with challenging tasks, and rule-based approaches for its versatility of handling normal tasks, by integrating simple rules with RL under a multi-stage architecture, which is inspired by typical auto-parking template that is universally used in geometric planning approach.
  • The authors present preliminary benchmarks and show the approach can outperform a geometric planning baseline in the robustness to environment noise and efficiency of planning in a high-fidelity simulator
  • Objectives:

    The authors' goal is to find a parking strategy that is able to accomplish the parking from various initial states which is given access to a limited experience on each task from D(T ).
  • Methods:

    114 state under certain constraints from ego-vehicle, parking space, and static obstacles distribution.
  • 115 task by a tuple:.
  • Given the state st ∈ PT (s) that perceived at time t for task T , the agent predicts a distribution of actions.
  • The agent interacts with the environment.
  • 119 and perceives state st+1 ∼ P and the immediate reward Rt according to the reward function.
  • PT (s) and PT define the environment in task T.
  • Results:

    144 To evaluate the performance of the approach, the authors compare it against two baselines that based on geometric planning and DQFD, and a MLP policy learned under the approach framework.
  • All experiments are done in a high-fidelity simulator
  • For such benchmarks the authors set the scenario-related parameters as follows: parking spot size equals to 6.0 × 2.5m with aisle width is larger than 6m, ego-vehicle size is 4.93 × 1.87m, and the distance from ego-vehicle’s back axle center to tail-stock is 1.04m.
  • The geometric planning baseline covers a smaller vertical distance range, which is 0.5m less than the policy (LSTM) learned by the approach.
  • The ranges of the initial states that the policies learned with DQFD and the approach as represented by MLP could cover are quite limited.
  • The policy (LSTM) learned by the approach is the most robust to the noise in the initial state
  • Conclusion:

    The authors proposed a framework that leverages the strength of learning-based approaches for robustness to environments noise and capability of dealing with challenging task, and rule-based approaches for its versatility of handling normal tasks, by integrating simple rules with RL under a multi-stage architecture, which inspired by the domain template
  • Such a framework provides a mechanism for them to incorporate prior knowledge for decomposing a task into a multi-stage problem with shorter-term rewards, which can lead to fast convergence to successful parking policies.
  • The authors foresee using model-based RL with optimization-based planning approach and model-free RL to accomplish parking problem under more complex and dynamic scenarios
Tables
  • Table1: Benchmark results for robustness to the initial state of our approach against baselines and MLP policy learned under our approach framework. The geometric meaning of column names are represented in Figure 2(a)
  • Table2: Benchmark results for robustness to environment noise of our approach, with listing lateral and vertical noise that added to target parking spot in terms of σx and σy, σx = α × parking spot width
Download tables as Excel
Reference
  • R. Alami, R. CHatila & S. Fleury (1998) An Architecture for Autonomy. The International Journal of 207 Robotics Research 17(4):315-337
    Google ScholarLocate open access versionFindings
  • Hélène Vorobieva, Sébastien Glaser, Nicoleta Minoiu-Enache & Saïd Mammar (2015) Automatic parallel 209 parking in tiny spots: path planning and control. IEEE Transactions on Intelligent Transportation Systems 210 16(1)396-410 211
    Google ScholarLocate open access versionFindings
  • Volodymyr Mnih, Koray Kavukcuoglu, David Silver & Andrei A. Rusu (2015) Human-level control through 212 deep reinforcement learning. Nature 518, 529-533 213
    Google ScholarLocate open access versionFindings
  • Todd Hester, Matej Vecerik, Olivier Pietquin & Marc Lanctot (2018) Deep Q-learning from demonstrations.
    Google ScholarLocate open access versionFindings
  • Daan Wierstra, Alexander Förster, Jan Peters & Jürgen Schmidhuber (2010) Recurrent policy gradients. 216 Logic Journal of the IGPL 18(5)620-634 217
    Google ScholarLocate open access versionFindings
  • Ofir Nachum, Mohammad Norouzi, Kelvin Xu & Dale Schuurmans (2017) Bridging the gap between value 218 and policy based reinforcement learning. NIPS 219
    Google ScholarLocate open access versionFindings
  • John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford & Oleg Klimov (2017) Proximal policy 220 optimization. arXiv preprint arXiv:1707.06347 221
    Findings
  • John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan & Pieter Abbeel (2015) Trust region policy 222 optimization. CoRR, abs/1502.05477 223
    Findings
  • Yoshua Bengio, Jerome Louradour, Ronan Collobert & Jason Weston (2009) Curriculum learning. ICML
    Google ScholarFindings
Your rating :
0

 

Tags
Comments