AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We develop a novel method for self-play based on replica-exchange Markov chain Monte Carlo

FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis

ICML, pp.8992-9004, (2020)

Cited: 8|Views157
EI
Full Text
Bibtex
Weibo

Abstract

Balancing performance and safety is crucial to deploying autonomous vehicles in multi-agent environments. In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies. Current approaches either make simplifying assumptions about other agents or lack rob...More

Code:

Data:

0
Introduction
  • Current autonomous vehicle (AV) technology still struggles in competitive multi-agent scenarios, such as merging onto a highway, where both maximizing performance and maintaining safety are important.
  • During the 2019 Formula One season, the race-winner achieved the fastest lap in only 33% of events [26].
  • The weak correlation between achieving the fastest lap-time and winning suggests that consistent and robust performance is critical to success.
  • The authors investigate this intuition in the setting of autonomous racing (AR).
  • The agent wins if it completes the race faster than its opponents; a crash automatically results in a loss
Highlights
  • Current autonomous vehicle (AV) technology still struggles in competitive multi-agent scenarios, such as merging onto a highway, where both maximizing performance and maintaining safety are important
  • The weak correlation between achieving the fastest lap-time and winning suggests that consistent and robust performance is critical to success. We investigate this intuition in the setting of autonomous racing (AR)
  • The central hypothesis of this paper is that distributionally robust evaluation of plans relative to the agent’s belief state about opponents, which is updated as new observations are made, can lead to policies achieving the same performance as non-robust approaches without sacrificing safety
  • To evaluate this hypothesis we identify a natural division of the underlying problem
  • We demonstrate the transfer of our methods from simulation to real autonomous racecars
  • The addition of recursive feasibility arguments for stronger safety guarantees could improve the applicability of these techniques to real-world settings
Methods
  • The authors first describe the AR environment used to conduct the experiments.
  • The authors experimentally determine the physical parameters of the agent models for simulation and use SLAM to build the virtual track as a mirror of a real location.
  • Both the hardware specifications and simulator are available to reviewers in anonymized form and will be released to the community
Conclusion
  • The central hypothesis of this paper is that distributionally robust evaluation of plans relative to the agent’s belief state about opponents, which is updated as new observations are made, can lead to policies achieving the same performance as non-robust approaches without sacrificing safety.
  • To evaluate this hypothesis the authors identify a natural division of the underlying problem.
Tables
  • Table1: The effect of distributional robustness on aggressiveness
  • Table2: The effect of adaptivity on win-rate
  • Table3: The resolution and ranges of the Trajectory Generator Look-up Table
Download tables as Excel
Related work
  • Reinforcement learning (RL) has achieved unprecedented success on classic two-player games [e.g. 73], leading to new approaches in partially-observable games with continuous action spaces [5, 14]. In these works, agents train via self-play using Monte Carlo tree search [17, 80] or population-based methods [40, 41]. The agents optimize expected performance rather than adapt to individual variations in opponent strategy, which can lead to poor performance against particular opponents [9]. In contrast, our method explicitly incorporates adaptivity to opponents.

    Robust approaches to RL and control (like this work) explicitly model uncertainty. In RL, this amounts to planning in a robust MDP [62] or a POMDP [42]. Early results Bagnell et al [8]
Reference
  • J. Abernethy and A. Rakhlin. Beating the adaptive bandit with high probability. In 2009 Information Theory and Applications Workshop, pages 280–289. IEEE, 2009.
    Google ScholarLocate open access versionFindings
  • N. Agarwal, B. Bullins, E. Hazan, S. M. Kakade, and K. Singh. Online control with adversarial disturbances. arXiv preprint arXiv:1902.08721, 2019.
    Findings
  • M. Althoff and J. M. Dolan. Online verification of automated road vehicles using reachability analysis. IEEE Transactions on Robotics, 30(4):903–918, 2014.
    Google ScholarLocate open access versionFindings
  • M. Althoff, M. Koschi, and S. Manzinger. Commonroad: Composable benchmarks for motion planning on roads. In 2017 IEEE Intelligent Vehicles Symposium (IV), pages 719–726. IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • K. Arulkumaran, A. Cully, and J. Togelius. Alphastar: An evolutionary computation perspective. arXiv preprint arXiv:1902.01724, 2019.
    Findings
  • K. J. ̊Astrom and B. Wittenmark. Adaptive control. Courier Corporation, 2013.
    Google ScholarFindings
  • P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
    Google ScholarLocate open access versionFindings
  • J. A. Bagnell, A. Y. Ng, and J. G. Schneider. Solving uncertain markov decision processes.
    Google ScholarFindings
  • T. Bansal, J. Pachocki, S. Sidor, I. Sutskever, and I. Mordatch. Emergent complexity via multi-agent competition. arXiv preprint arXiv:1710.03748, 2017.
    Findings
  • C. J. Belisle, H. E. Romeijn, and R. L. Smith. Hit-and-run algorithms for generating multivariate distributions. Mathematics of Operations Research, 18(2):255–266, 1993.
    Google ScholarLocate open access versionFindings
  • A. Bemporad and M. Morari. Robust model predictive control: A survey. In Robustness in identification and control, pages 207–226.
    Google ScholarLocate open access versionFindings
  • A. Ben-Tal, D. den Hertog, A. D. Waegenaere, B. Melenberg, and G. Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341– 357, 2013.
    Google ScholarLocate open access versionFindings
  • J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. Journal of machine learning research, 13(Feb):281–305, 2012.
    Google ScholarLocate open access versionFindings
  • C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dkebiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, et al. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
    Findings
  • D. Bertsimas and M. Sim. The price of robustness. Operations research, 52(1):35–53, 2004.
    Google ScholarLocate open access versionFindings
  • G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym, 2016.
    Google ScholarLocate open access versionFindings
  • C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, 4(1):1–43, 2012.
    Google ScholarLocate open access versionFindings
  • S. Bubeck, N. Cesa-Bianchi, et al. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends R in Machine Learning, 5(1):1–122, 2012.
    Google ScholarLocate open access versionFindings
  • V. Cerny. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. Journal of optimization theory and applications, 45(1):41–51, 1985.
    Google ScholarLocate open access versionFindings
  • R. C. Coulter. Implementation of the pure pursuit path tracking algorithm. Technical report, Carnegie-Mellon UNIV Pittsburgh PA Robotics INST, 1992.
    Google ScholarFindings
  • J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.
    Google ScholarLocate open access versionFindings
  • S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu. Regret bounds for robust adaptive control of the linear quadratic regulator. In Advances in Neural Information Processing Systems, pages 4188–4197, 2018.
    Google ScholarLocate open access versionFindings
  • W. Ding and S. Shen. Online vehicle trajectory prediction using policy anticipation network and optimization-based context reasoning. arXiv preprint arXiv:1903.00847, 2019.
    Findings
  • J. Doyle, K. Glover, P. Khargonekar, and B. Francis. State-space solutions to standard h2 and h∞ control problems. In 1988 American Control Conference, pages 1691–1696. IEEE, 1988.
    Google ScholarLocate open access versionFindings
  • J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto thel1-ball for learning in high dimensions. Proceedings of the 25th international conference on Machine learning - ICML ’08, 2008. doi: 10.1145/1390156.1390191. URL http://dx.doi.org/10.1145/1390156.1390191.
    Locate open access versionFindings
  • Federation Internationale de l’Automobile. Formula one 2019 results. https://www.formula1.com/en/results.html/2019/, 2019.
    Findings
  • D. Ferguson, T. M. Howard, and M. Likhachev. Motion planning in urban environments. Journal of Field Robotics, 25(11-12):939–960, 2008.
    Google ScholarLocate open access versionFindings
  • E. Galceran, A. G. Cunningham, R. M. Eustice, and E. Olson. Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction. In Robotics: Science and Systems, volume 1, 2015.
    Google ScholarLocate open access versionFindings
  • Y. Gao, A. Gray, H. E. Tseng, and F. Borrelli. A tube-based robust nonlinear predictive control approach to semiautonomous ground vehicles. Vehicle System Dynamics, 52(6):802–823, 2014.
    Google ScholarLocate open access versionFindings
  • E. Gat, R. P. Bonnasso, R. Murphy, et al. On three-layer architectures. Artificial intelligence and mobile robots, 195:210, 1998.
    Google ScholarLocate open access versionFindings
  • C. J. Geyer. Markov chain monte carlo maximum likelihood. 1991.
    Google ScholarFindings
  • I. Gilboa and M. Marinacci. Ambiguity and the bayesian paradigm. In Readings in formal epistemology, pages 385–439.
    Google ScholarLocate open access versionFindings
  • A. Gleave, M. Dennis, N. Kant, C. Wild, S. Levine, and S. Russell. Adversarial policies: Attacking deep reinforcement learning. arXiv preprint arXiv:1905.10615, 2019.
    Findings
  • [35] W. Hess, D. Kohler, H. Rapp, and D. Andor. Real-time loop closure in 2d lidar slam. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 1271–1278. IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • [36] P. Hintjens. ZeroMQ: messaging for many applications. ” O’Reilly Media, Inc.”, 2013.
    Google ScholarFindings
  • [37] T. M. Howard. Adaptive model-predictive motion planning for navigation in complex environments. Carnegie Mellon University, 2009.
    Google ScholarFindings
  • [38] J. Hu and P. Hu. Annealing adaptive search, cross-entropy, and stochastic approximation in global optimization. Naval Research Logistics (NRL), 58(5):457–477, 2011.
    Google ScholarLocate open access versionFindings
  • [39] L. Ingber. Simulated annealing: Practice versus theory. Mathematical and computer modelling, 18(11):29–57, 1993.
    Google ScholarLocate open access versionFindings
  • [40] M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, et al. Population based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
    Findings
  • [41] M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castaneda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, et al. Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, 2019.
    Google ScholarLocate open access versionFindings
  • [42] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
    Google ScholarLocate open access versionFindings
  • [43] A. Kelly and B. Nagy. Reactive nonholonomic trajectory generation via parametric optimal control. The International Journal of Robotics Research, 22(7-8):583–601, 2003.
    Google ScholarLocate open access versionFindings
  • [44] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    Findings
  • [45] D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever, and M. Welling. Improved variational inference with inverse autoregressive flow. In Advances in neural information processing systems, pages 4743–4751, 2016.
    Google ScholarLocate open access versionFindings
  • [46] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. science, 220(4598):671–680, 1983.
    Google ScholarLocate open access versionFindings
  • [47] M. J. Kochenderfer. Decision making under uncertainty: theory and application. MIT press, 2015.
    Google ScholarFindings
  • [48] A. Kulesza, B. Taskar, et al. Determinantal point processes for machine learning. Foundations and Trends R in Machine Learning, 5(2–3):123–286, 2012.
    Google ScholarLocate open access versionFindings
  • [49] P. R. Kumar. A survey of some results in stochastic adaptive control. SIAM Journal on Control and Optimization, 23(3):329–380, 1985.
    Google ScholarLocate open access versionFindings
  • [50] A. Liniger and J. Lygeros. A noncooperative game approach to autonomous racing. IEEE Transactions on Control Systems Technology, 2019.
    Google ScholarLocate open access versionFindings
  • [51] L. Lovasz. Hit-and-run mixes fast. Mathematical Programming, 86(3):443–461, 1999.
    Google ScholarLocate open access versionFindings
  • [52] L. Lovasz and S. Vempala. Hit-and-run is fast and fun. preprint, Microsoft Research, 2003.
    Google ScholarFindings
  • [53] L. Lovasz and S. Vempala. Hit-and-run from a corner. SIAM Journal on Computing, 35(4): 985–1005, 2006.
    Google ScholarLocate open access versionFindings
  • [54] D. Luenberger. Optimization by Vector Space Methods. Wiley, 1969.
    Google ScholarFindings
  • [55] A. Majumdar and R. Tedrake. Robust online motion planning with regions of finite time invariance. In Algorithmic foundations of robotics X, pages 543–558.
    Google ScholarLocate open access versionFindings
  • [56] A. Mandlekar, Y. Zhu, A. Garg, L. Fei-Fei, and S. Savarese. Adversarially robust policy learning: Active construction of physically-plausible perturbations. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3932–3939. IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • [57] E. Marinari and G. Parisi. Simulated tempering: a new monte carlo scheme. EPL (Europhysics Letters), 19(6):451, 1992.
    Google ScholarLocate open access versionFindings
  • [58] J. Matyas. Random optimization. Automation and Remote control, 26(2):246–253, 1965.
    Google ScholarLocate open access versionFindings
  • [59] M. McNaughton. Parallel algorithms for real-time motion planning. 2011.
    Google ScholarFindings
  • [60] B. Nagy and A. Kelly. Trajectory generation for car-like robots using cubic curvature polynomials. Field and Service Robots, 11, 2001.
    Google ScholarLocate open access versionFindings
  • [61] H. Namkoong and J. C. Duchi. Variance regularization with convex objectives. In Advances in Neural Information Processing Systems 30, 2017.
    Google ScholarLocate open access versionFindings
  • [62] A. Nilim and L. El Ghaoui. Robust control of markov decision processes with uncertain transition matrices. Operations Research, 53(5):780–798, 2005.
    Google ScholarLocate open access versionFindings
  • [63] J. Norden, M. O’Kelly, and A. Sinha. Efficient black-box assessment of autonomous vehicle safety. arXiv preprint arXiv:1912.03618, 2019.
    Findings
  • [64] M. O’Kelly, H. Zheng, J. Auckley, A. Jain, K. Luong, and R. Mangharam. Technical Report: TunerCar: A Superoptimization Toolchain for Autonomous Racing. Technical Report UPennESE-09-15, University of Pennsylvania, September 2019. https://repository.upenn.edu/mlab_papers/122/.
    Findings
  • [65] A. Oord, Y. Li, I. Babuschkin, K. Simonyan, O. Vinyals, K. Kavukcuoglu, G. Driessche, E. Lockhart, L. Cobo, F. Stimberg, et al. Parallel wavenet: Fast high-fidelity speech synthesis. In International Conference on Machine Learning, pages 3918–3926, 2018.
    Google ScholarLocate open access versionFindings
  • [66] G. Papamakarios, T. Pavlakou, and I. Murray. Masked autoregressive flow for density estimation. In Advances in Neural Information Processing Systems, pages 2338–2347, 2017.
    Google ScholarLocate open access versionFindings
  • [67] L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta. Robust adversarial reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2817–2826. JMLR. org, 2017.
    Google ScholarLocate open access versionFindings
  • [68] D. J. Rezende and S. Mohamed. Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37, pages 1530–1538. JMLR. org, 2015.
    Google ScholarLocate open access versionFindings
  • [69] A. Sadat, M. Ren, A. Pokrovsky, Y.-C. Lin, E. Yumer, and R. Urtasun. Jointly learnable behavior and trajectory planning for self-driving vehicles. arXiv preprint arXiv:1910.04586, 2019.
    Findings
  • [70] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan. Planning for autonomous cars that leverage effects on human actions. In Robotics: Science and Systems, volume 2. Ann Arbor, MI, USA, 2016.
    Google ScholarLocate open access versionFindings
  • [71] P. Samson. Concentration of measure inequalities for Markov chains and φ-mixing processes. Annals of Probability, 28(1):416–461, 2000.
    Google ScholarLocate open access versionFindings
  • [72] S. Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends R in Machine Learning, 4(2):107–194, 2012.
    Google ScholarLocate open access versionFindings
  • [73] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
    Google ScholarLocate open access versionFindings
  • [74] A. Sinha and J. C. Duchi. Learning kernels with random features. In Advances in Neural Information Processing Systems, pages 1298–1306, 2016.
    Google ScholarLocate open access versionFindings
  • [75] A. Sinha, H. Namkoong, and J. Duchi. Certifiable distributional robustness with principled adversarial training. In Proceedings of the Fifth International Conference on Learning Representations, 2017. arXiv:1710.10571 [cs.LG].
    Findings
  • [76] E. Smirnova, E. Dohmatob, and J. Mary. Distributionally robust reinforcement learning. arXiv preprint arXiv:1902.08708, 2019.
    Findings
  • [77] R. L. Smith. Efficient monte carlo procedures for generating points uniformly distributed over bounded regions. Operations Research, 32(6):1296–1308, 1984.
    Google ScholarLocate open access versionFindings
  • [78] J. M. Snider et al. Automatic steering methods for autonomous automobile path tracking. Robotics Institute, Pittsburgh, PA, Tech. Rep. CMU-RITR-09-08, 2009.
    Google ScholarFindings
  • [79] S. Sontges, M. Koschi, and M. Althoff. Worst-case analysis of the time-to-react using reachable sets. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1891–1897. IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • [80] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
    Google ScholarFindings
  • [81] R. H. Swendsen and J.-S. Wang. Replica monte carlo simulation of spin-glasses. Physical review letters, 57(21):2607, 1986.
    Google ScholarLocate open access versionFindings
  • [82] A. Tamar, S. Mannor, and H. Xu. Scaling up robust mdps using function approximation. In International Conference on Machine Learning, pages 181–189, 2014.
    Google ScholarLocate open access versionFindings
  • [83] T. Uchiya, A. Nakamura, and M. Kudo. Algorithms for adversarial bandit problems with multiple plays. In International Conference on Algorithmic Learning Theory, pages 375–389.
    Google ScholarLocate open access versionFindings
  • [84] J. Van Den Berg, P. Abbeel, and K. Goldberg. Lqg-mp: Optimized path planning for robots with motion uncertainty and imperfect state information. The International Journal of Robotics Research, 30(7):895–913, 2011.
    Google ScholarLocate open access versionFindings
  • [85] B. Vedder. Vedder electronic speed controller. URL https://vesc-project.com/documentation.
    Findings
  • [86] G. Vinnicombe. Frequency domain uncertainty and the graph topology. IEEE Transactions on Automatic Control, 38(9):1371–1383, 1993.
    Google ScholarLocate open access versionFindings
  • [87] C. Walsh and S. Karaman. Cddt: Fast approximate 2d ray casting for accelerated localization. abs/1705.01167, 2017. URL http://arxiv.org/abs/1705.01167.
    Findings
  • [88] Z. Wang, R. Spica, and M. Schwager. Game theoretic motion planning for multi-robot racing. In Distributed Autonomous Robotic Systems, pages 225–238.
    Google ScholarLocate open access versionFindings
  • [89] T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, and M. J. Weinberger. Inequalities for the l1 deviation of the empirical distribution. Hewlett-Packard Labs, Tech. Rep, 2003.
    Google ScholarFindings
  • [90] G. Williams, B. Goldfain, P. Drews, J. M. Rehg, and E. A. Theodorou. Autonomous racing with autorally vehicles and differential games. arXiv preprint arXiv:1707.04540, 2017.
    Findings
  • [91] D. P. Zhou and C. J. Tomlin. Budget-constrained multi-armed bandits with multiple plays. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    Google ScholarLocate open access versionFindings
  • [92] K. Zhou, J. C. Doyle, and K. Glover. Robust and optimal control. 1996.
    Google ScholarFindings
  • 1. First, note that E[Z2p] ≤ CpE[Zp]. For 1 ≤ p ≤ 2, we can take a = 2/p in Lemma 4, so that we have
    Google ScholarFindings
  • 1. Trajectory length: cal = s, where 1/s is the arc length of each trajectory. Short and myopic trajectories are penalized.
    Google ScholarFindings
  • 2. Maximum absolute curvature: cmc = maxi{|κi|}, where κi are the curvatures at each point on a trajectory. Large curvatures are penalized to preserve smoothness of trajectories.
    Google ScholarFindings
  • 3. Mean absolute curvature: cac
    Google ScholarFindings
  • 4. Hysteresis loss: Measured between the previous chosen trajectory and each of the sampled trajectories, chys = ||θp[nr1e,vn2] − θ[0,n2−n1]||22, where θprev is the array of heading angles of each pose on the previous selected trajectory by the vehicle, θ is the array of heading angles of each pose on the trajectory being evaluated, and the ranges [n1, n2] and [0, n2 − n1] define contiguous portions of trajectories that are compared. Trajectories dissimilar to the previously selected trajectory are penalized.
    Google ScholarLocate open access versionFindings
  • 5. Lap progress: Measured along the track from the start to the end point of each trajectory in the normal and tangential coordinate system, cp
    Google ScholarFindings
  • 6. Maximum acceleration: cma
    Google ScholarFindings
  • 7. Maximum absolute curvature change: Measured between adjacent points along each trajectory, cdk
    Google ScholarFindings
  • 8. Maximum lateral acceleration: cla = maxi{|κ|ivi2}, where κ and v are the arrays of curvature and velocity of all points on a trajectory. High maximum lateral accelerations are penalized.
    Google ScholarFindings
  • 9. Minimum speed: cms =
    Google ScholarFindings
  • 10. Minimum range: cmr = mini{ri}, where r is the array of range measurements (distance to static obstacles) generated by the simulator. Smaller minimum range is penalized, and trajectories with minimum ranges lower than a threshold are given infinite cost and therefore discarded.
    Google ScholarFindings
  • 11. Cumulative inter-vehicle distance short: cdyshort =
    Google ScholarFindings
  • 12. Discounted cumulative inter-vehicle distance long: cdylong =
    Google ScholarFindings
  • 13. Relative progress: Measured along the track between the sampled trajectories’ endpoints and the opponent’s selected trajectory’s endpoint, cdp = (sopp end − send)+, where sopp end is the position along the track in tangential coordinates of the endpoint of the opponent’s chosen trajectory. Lagging behind the opponent is penalized.
    Google ScholarFindings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn