Reinforcement-Learning based Portfolio Management with Augmented Asset Movement Prediction States

national conference on artificial intelligence, 2020.

Cited by: 1|Bibtex|Views44|Links
Keywords:
State Augmented RLreinforcement learninglong short-term memoryenvironment uncertaintyNatural Language ProcessingMore(17+)
Weibo:
We propose State Augmented RL, a novel and generic stateaugmented reinforcement learning framework that can integrate heterogeneous data sources into standard RL training pipelines for learning Portfolio management strategies

Abstract:

Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinf...More

Code:

Data:

0
Introduction
  • An investment portfolio is a basket of assets that can hold stocks, bonds, cash and more.
  • PM involves sequential decision making of continuously reallocating a number of funds into assets based on the latest information to achieve the investment goal.
  • It is natural to leverage reinforcement learning (RL) to model the decision making process for asset reallocation (Almahdi and Yang 2017; Jiang, Xu, and Liang 2017; Liang et al 2018).
  • Based on the liquidity hypothesis, the algorithm which observes the environment and makes decisions to interact with the market and rebalance the portfolio can be defined as an agent
Highlights
  • An investment portfolio is a basket of assets that can hold stocks, bonds, cash and more
  • We compare the performance of State Augmented RL (SARL) with other methods on two datasets: Bitcoin and HighTech. We will summarize these two datasets, specify their data challenges, define the evaluation metrics, introduce the baseline Portfolio management (PM) methods for comparison and perform extensive experiments and simulations to validate the importance of state augmentation in SARL for PM
  • The Portfolio Value (PV) at the final horizon in testing set as shown in Figure 3a and 3b demonstrates the effectiveness of SARL
  • SARL improves PV by 140.9% and 15.7% when compared to the state-ofart reinforcement learning (RL) algorithm for PM (DPM) in Bitcoin and HighTech respectively
  • We propose SARL, a novel and generic stateaugmented RL framework that can integrate heterogeneous data sources into standard RL training pipelines for learning PM strategies
  • We conducted comparative experiments and extensive simulations to validate the superior performance of SARL in PM
Methods
  • Environment uncertainty – Different from game playing tasks which have certain rules, PM is deeply influenced by the market dynamics.
  • The PM strategy of a standard RL agent trained on past market dynamics may not be generalizable to the future market if there are substantial changes in market dynamics.
  • The authors will use the Sharpe ratio of different testing time periods to illustrate the influence of environment uncertainty.
  • CRP OLMAR WMAMR EW DPM SARL
Results
  • The authors compare the performance of SARL with other methods on two datasets: Bitcoin and HighTech.
  • In Bitcoin dataset, the authors use the previous prices of the past 30 days to train a classifier for price up/down prediction.
  • In HighTech dataset, the authors use the financial news related to stocks for classifier training.
  • In the SARL training, the authors use the prices of past 30 days as standard state s∗.
  • Since the market of cryptocurrency has more volatility, it’s easier for a good agent to gain more profits
Conclusion
  • The authors propose SARL, a novel and generic stateaugmented RL framework that can integrate heterogeneous data sources into standard RL training pipelines for learning PM strategies.
  • Tested on the Bitcoin and Hightech datasets, SARL can achieve significantly better portfolio value and Sharpe ratio.
  • The authors conducted comparative experiments and extensive simulations to validate the superior performance of SARL in PM.
  • The authors believe this work will shed light on more effective and generic RL-based PM algorithms
Summary
  • Introduction:

    An investment portfolio is a basket of assets that can hold stocks, bonds, cash and more.
  • PM involves sequential decision making of continuously reallocating a number of funds into assets based on the latest information to achieve the investment goal.
  • It is natural to leverage reinforcement learning (RL) to model the decision making process for asset reallocation (Almahdi and Yang 2017; Jiang, Xu, and Liang 2017; Liang et al 2018).
  • Based on the liquidity hypothesis, the algorithm which observes the environment and makes decisions to interact with the market and rebalance the portfolio can be defined as an agent
  • Objectives:

    Considering the policy μθ, the goal is to maximize the objective function parameterized by θ, the authors can formally write it as below:.
  • Methods:

    Environment uncertainty – Different from game playing tasks which have certain rules, PM is deeply influenced by the market dynamics.
  • The PM strategy of a standard RL agent trained on past market dynamics may not be generalizable to the future market if there are substantial changes in market dynamics.
  • The authors will use the Sharpe ratio of different testing time periods to illustrate the influence of environment uncertainty.
  • CRP OLMAR WMAMR EW DPM SARL
  • Results:

    The authors compare the performance of SARL with other methods on two datasets: Bitcoin and HighTech.
  • In Bitcoin dataset, the authors use the previous prices of the past 30 days to train a classifier for price up/down prediction.
  • In HighTech dataset, the authors use the financial news related to stocks for classifier training.
  • In the SARL training, the authors use the prices of past 30 days as standard state s∗.
  • Since the market of cryptocurrency has more volatility, it’s easier for a good agent to gain more profits
  • Conclusion:

    The authors propose SARL, a novel and generic stateaugmented RL framework that can integrate heterogeneous data sources into standard RL training pipelines for learning PM strategies.
  • Tested on the Bitcoin and Hightech datasets, SARL can achieve significantly better portfolio value and Sharpe ratio.
  • The authors conducted comparative experiments and extensive simulations to validate the superior performance of SARL in PM.
  • The authors believe this work will shed light on more effective and generic RL-based PM algorithms
Tables
  • Table1: The training and testing accuracy of the text classifier for different word embedding methods
  • Table2: Sharpe Ratio of different time period in Bitcoin dataset.1(w:week, m:month)
  • Table3: Sharpe Ratio of different time period in HighTech dataset. (w:week, m:month)
  • Table4: Sharpe Ratio of different time period on Bitcoin dataset. (w:week, m:month)
  • Table5: Sharpe Ratio of different time period on HighTech dataset. (w:week, m:month)
Download tables as Excel
Related work
  • With the availability of large scale market data, it’s natural to employ deep learning (DL) model which can exploit the potential laws of market in PM. Prior arts (Heaton, Polson, and Witte 2017; Schumaker et al 2012; Nguyen, Shirai, and Velcin 2015) in training a neural network (NN) model for market behavior prediction have shown their effectiveness in asset price prediction and asset allocation. However, DL models which have no interaction with the market has a natural disadvantage in decision making problem like PM. Reinforcement learning algorithms have been proved effective in decision making problems in recent years and deep reinforcement learning (DRL) (Chen et al 2019), the integration of DL and RL, is widely used in the financial field. For instance, (Almahdi and Yang 2017) proposed a recurrent reinforcement learning (RRL) method, with a coherent riskadjusted performance objective function named the Calmar ratio, to obtain both buy and sell signals and asset allocation weights. (Jiang, Xu, and Liang 2017) use the modelfree Deep Deterministic Policy Gradient (DDPG) (Lillicrap et al 2015) to dynamically optimize cryptocurrency portfolios. Similarly, (Liang et al 2018) optimize asset portfolios by using the DDPG as well as the Proximal Policy Optimization (PPO) (Schulman et al 2017). (Buehler et al 2019) presents a DRL framework to hedge a portfolio of derivatives under transaction costs, where the framework does not depend on specific market dynamics. However, they mainly tackle the PM problem by directly utilizing the direct observation of historical prices for RL training, which may largely overlook data noise and overestimate the model’s learning capability.
Funding
  • This work was supported by National Key Research and Development Program of China (2018AAA0101900), Zhejiang Natural Science Foundation (LR19F020002, LZ17F020001), National Natural Science Foundation of China(61976185, U19B200023, 61572431), the Fundamental Research Funds for the Central Universities and Chinese Knowledge Center for Engineering Sciences and Technology, and IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR) – a research collaboration as part of the IBM AI Horizons Network, National Science Foundation award CCF-1910100 and DARPA award ASED00009970
Reference
  • [Almahdi and Yang 2017] Almahdi, S., and Yang, S. Y. 2017. An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications 87:267–279.
    Google ScholarLocate open access versionFindings
  • [Buehler et al. 2019] Buehler, H.; Gonon, L.; Teichmann, J.; and Wood, B. 2019. Deep hedging. Quantitative Finance 1–21.
    Google ScholarFindings
  • [Chen et al. 2019] Chen, L.; Zhang, H.; Xiao, J.; He, X.; Pu, S.; and Chang, S.-F. 2019. Counterfactual critic multi-agent training for scene graph generation. In ICCV, 4613–4623.
    Google ScholarFindings
  • [Cover 2011] Cover, T. M. 2011. Universal portfolios. In The Kelly Capital Growth Investment Criterion: Theory and Practice. World Scientific. 181–209.
    Google ScholarLocate open access versionFindings
  • [Ding et al. 2014] Ding, X.; Zhang, Y.; Liu, T.; and Duan, J. 2014. Using structured events to predict stock price movement: An empirical investigation. In EMNLP, 1415–1425.
    Google ScholarFindings
  • [Gao and Zhang 2013] Gao, L., and Zhang, W. 2013. Weighted moving average passive aggressive algorithm for online portfolio selection. In 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics, volume 1, 327–330. IEEE.
    Google ScholarLocate open access versionFindings
  • [Heaton, Polson, and Witte 2017] Heaton, J.; Polson, N.; and Witte, J. H. 201Deep learning for finance: deep portfolios. Applied Stochastic Models in Business and Industry 33(1):3–12.
    Google ScholarLocate open access versionFindings
  • [Hochreiter and Schmidhuber 1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735–1780.
    Google ScholarLocate open access versionFindings
  • [Jiang, Xu, and Liang 2017] Jiang, Z.; Xu, D.; and Liang, J. 2017. A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.
    Findings
  • [Joulin et al. 2016] Joulin, A.; Grave, E.; Bojanowski, P.; and Mikolov, T. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
    Findings
  • [Kanwar and others 2019] Kanwar, N., et al. 2019. Deep Reinforcement Learning-based Portfolio Management. Ph.D. Dissertation.
    Google ScholarFindings
  • [Li and Hoi 2012] Li, B., and Hoi, S. C. 20On-line portfolio selection with moving average reversion. arXiv preprint arXiv:1206.4626.
    Findings
  • [Liang et al. 2018] Liang, Z.; Chen, H.; Zhu, J.; Jiang, K.; and Li, Y. 2018. Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940.
    Findings
  • [Lillicrap et al. 2015] Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; and Wierstra, D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
    Findings
  • [Mikolov et al. 2013] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In NIPS, 3111–3119.
    Google ScholarFindings
  • [Nguyen, Shirai, and Velcin 2015] Nguyen, T. H.; Shirai, K.; and Velcin, J. 2015. Sentiment analysis on social media for stock movement prediction. Expert Systems with Applications 42(24):9603–9611.
    Google ScholarFindings
  • [Ormos and Urban 2013] Ormos, M., and Urban, A. 2013. Performance analysis of log-optimal portfolio strategies with transaction costs. Quantitative Finance 13(10):1587– 1597.
    Google ScholarLocate open access versionFindings
  • [Papenbrock 2016] Papenbrock, J. 2016. Using ai to establish a reliable and objective way of diversification.
    Google ScholarFindings
  • [Pennington, Socher, and Manning 2014] Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global vectors for word representation. In EMNLP, 1532–1543.
    Google ScholarLocate open access versionFindings
  • [Schulman et al. 2017] Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; and Klimov, O. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
    Findings
  • [Schumaker et al. 2012] Schumaker, R. P.; Zhang, Y.; Huang, C.-N.; and Chen, H. 2012. Evaluating sentiment in financial news articles. Decision Support Systems 53(3):458–464.
    Google ScholarLocate open access versionFindings
  • [Shang et al. 2018] Shang, J.; Liu, J.; Jiang, M.; Ren, X.; Voss, C. R.; and Han, J. 2018. Automated phrase mining from massive text corpora. IEEE Transactions on Knowledge and Data Engineering 30(10):1825–1837.
    Google ScholarLocate open access versionFindings
  • [Sharpe and Pnces 1964] Sharpe, C. A. P., and Pnces, C.-t. A. 1964. A theory of market equilibrium under conditions of risk, 19j. FINANCE425 10:2977928.
    Google ScholarLocate open access versionFindings
  • [Silver et al. 2014] Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; and Riedmiller, M. 2014. Deterministic policy gradient algorithms.
    Google ScholarFindings
  • [Sutton et al. 2000] Sutton, R. S.; McAllester, D. A.; Singh, S. P.; and Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In NIPS, 1057–1063.
    Google ScholarFindings
  • [Tsay 2010] Tsay, R. S. 2010. Analysis of Financial Time Series. John Wiley & Sons.
    Google ScholarFindings
  • [Yang et al. 2016] Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; and Hovy, E. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 1480–1489.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments