A new deep reinforcement learning model for dynamic portfolio optimization

Weiwei Zhuang; Cai Chen; Guoxin Qiu

doi:10.52396/JUSTC-2022-0072

PDF( 14188 KB)

Open Access JUSTC Article 13 December 2022

A new deep reinforcement learning model for dynamic portfolio optimization

1.
International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230601, China
2.
Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
3.
School of Business, Anhui Xinhua University, Hefei 230088, China

Cite this:

https://doi.org/10.52396/JUSTC-2022-0072

More Information

Author Bio:
Weiwei Zhuang received her Ph.D. degree in Probability and Statistics from the University of Science and Technology of China (USTC) in 2006. She is an Associate Professor with the Department of Statistics and Finance, USTC. Her research interests include statistical dependence, stochastic comparisons, semiparametric model, and their applications

Guoxin Qiu received his Ph.D. degree in Statistics from the University of Science and Technology of China (USTC) in 2017. He is currently a Professor with the Business School, Anhui Xinhua University. His research interests include information theory, stochastic comparisons, semiparametric model, and their applications
Corresponding author: E-mail: qiugx02@ustc.edu.cn
Received Date: 28 April 2022
Accepted Date: 22 May 2022

Available Online: 13 December 2022

Abstract Full text PDF

Abstract

Abstract

There are many challenging problems for dynamic portfolio optimization using deep reinforcement learning, such as the high dimensions of the environmental and action spaces, as well as the extraction of useful information from a high-dimensional state space and noisy financial time-series data. To solve these problems, we propose a new model structure called the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) method with multi-head attention reinforcement learning. This new model integrates data processing methods, a deep learning model, and a reinforcement learning model to improve the perception and decision-making abilities of investors. Empirical analysis shows that our proposed model structure has some advantages in dynamic portfolio optimization. Moreover, we find another robust investment strategy in the process of experimental comparison, where each stock in the portfolio is given the same capital and the structure is applied separately.

Graphical abstract

The overall framework of our research.

Abstract

There are many challenging problems for dynamic portfolio optimization using deep reinforcement learning, such as the high dimensions of the environmental and action spaces, as well as the extraction of useful information from a high-dimensional state space and noisy financial time-series data. To solve these problems, we propose a new model structure called the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) method with multi-head attention reinforcement learning. This new model integrates data processing methods, a deep learning model, and a reinforcement learning model to improve the perception and decision-making abilities of investors. Empirical analysis shows that our proposed model structure has some advantages in dynamic portfolio optimization. Moreover, we find another robust investment strategy in the process of experimental comparison, where each stock in the portfolio is given the same capital and the structure is applied separately.

Public Summary

The CEEMDAN-Multi-Att-RL structure advocated in this paper improves the trading ability of financial time series data compared with the deep reinforcement learning without any processing.
This paper explores two investment strategies of dynamic portfolio optimization using deep reinforcement learning. Investors can choose them according to their own risk preference.
Appropriate deep learning network and reward settings are configured according to the given stock market environment. These enhance learning effectiveness of deep reinforcement learning.

FullText(HTML)

References(15)

References

[1]	Neuneier R. Optimal asset allocation using adaptive dynamic programming. In: Proceedings of the 8th International Conference on Neural Information Processing Systems. New York: ACM, 1995: 952–958.
[2]	Nevmyvaka Y, Feng Y, Kearns M. Reinforcement learning for optimized trade execution. In: ICML '06: Proceedings of the 23rd International Conference on Machine Learning. New York: ACM Press, 2006: 673–680.
[3]	Meng T L, Khushi M. Reinforcement learning in financial markets. Data, 2019, 4: 110. doi: 10.3390/data4030110
[4]	Liu X, Xiong Z, Zhong S, et al. Practical deep reinforcement learning approach for stock trading. 2022. https://arxiv.org/abs/1811.07522. Accessed April 1, 2022.
[5]	Brim A. Deep reinforcement learning pairs trading with a double deep Q-network. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2020: 222–227.
[6]	Gao Z, Gao Y, Hu Y, et al. Application of deep Q-network in portfolio management. In: 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). IEEE, 2020: 268–275.
[7]	Lee J, Koh H, Choe H J. Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning. Applied Intelligence, 2021, 51: 6202–6223. doi: 10.1007/s10489-021-02218-4
[8]	Carta S, Corriga A, Ferreira A, et al. A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Applied Intelligence, 2021, 51: 889–905. doi: 10.1007/s10489-020-01839-5
[9]	Théate T, Ernst D. An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications, 2021, 173: 114632. doi: 10.1016/j.eswa.2021.114632
[10]	Lei K, Zhang B, Li Y, et al. Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading. Expert Systems with Applications, 2020, 140: 112872. doi: 10.1016/j.eswa.2019.112872
[11]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. n: Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000–6010.
[12]	Huang N E, Shen Z, Long S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London Series A: Mathematical, Physical and Engineering Sciences, 1998, 454: 903–995. doi: 10.1098/rspa.1998.0193
[13]	Torres M E, Colominas M A, Schlotthauer G, et al. A complete ensemble empirical mode decomposition with adaptive noise. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Prague, Czech Republic: IEEE, 2011: 4144–4147.
[14]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: The MIT Press, 2018.
[15]	Bellman R. Dynamic Programming. Princeton: Princeton University Press, 1972.

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

Figure 1. Model structure.

Figure 2. Multi-head attention.

Figure 3. The structure diagram of the Q-network.

Figure 4. Principles of the two strategies.

Figure 5. Train set and test set.

Figure 6. The IMF components of code 600009, the left column for EMD and the right column for CEEMDAN.

Figure 7. The CEEMDAN and EMD method to process stock minute data.

Figure 8. Analysis of trading point.

Figure 9. Compound interest curve in 2018. The left subgraph represents the result of the first investment strategy, and the right subgraph represents the result of another investment strategy.

Figure 11. Compound interest curve in 2020. The left subgraph represents the result of the first investment strategy, and the right subgraph represents the result of another investment strategy.

Figure 10. Compound interest curve in 2019. The left subgraph represents the result of the first investment strategy, and the right subgraph represents the result of another investment strategy.

[1]	Neuneier R. Optimal asset allocation using adaptive dynamic programming. In: Proceedings of the 8th International Conference on Neural Information Processing Systems. New York: ACM, 1995: 952–958.
[2]	Nevmyvaka Y, Feng Y, Kearns M. Reinforcement learning for optimized trade execution. In: ICML '06: Proceedings of the 23rd International Conference on Machine Learning. New York: ACM Press, 2006: 673–680.
[3]	Meng T L, Khushi M. Reinforcement learning in financial markets. Data, 2019, 4: 110. doi: 10.3390/data4030110
[4]	Liu X, Xiong Z, Zhong S, et al. Practical deep reinforcement learning approach for stock trading. 2022. https://arxiv.org/abs/1811.07522. Accessed April 1, 2022.
[5]	Brim A. Deep reinforcement learning pairs trading with a double deep Q-network. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2020: 222–227.
[6]	Gao Z, Gao Y, Hu Y, et al. Application of deep Q-network in portfolio management. In: 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). IEEE, 2020: 268–275.
[7]	Lee J, Koh H, Choe H J. Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning. Applied Intelligence, 2021, 51: 6202–6223. doi: 10.1007/s10489-021-02218-4
[8]	Carta S, Corriga A, Ferreira A, et al. A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Applied Intelligence, 2021, 51: 889–905. doi: 10.1007/s10489-020-01839-5
[9]	Théate T, Ernst D. An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications, 2021, 173: 114632. doi: 10.1016/j.eswa.2021.114632
[10]	Lei K, Zhang B, Li Y, et al. Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading. Expert Systems with Applications, 2020, 140: 112872. doi: 10.1016/j.eswa.2019.112872
[11]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. n: Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000–6010.
[12]	Huang N E, Shen Z, Long S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London Series A: Mathematical, Physical and Engineering Sciences, 1998, 454: 903–995. doi: 10.1098/rspa.1998.0193
[13]	Torres M E, Colominas M A, Schlotthauer G, et al. A complete ensemble empirical mode decomposition with adaptive noise. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Prague, Czech Republic: IEEE, 2011: 4144–4147.
[14]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: The MIT Press, 2018.
[15]	Bellman R. Dynamic Programming. Princeton: Princeton University Press, 1972.

TrendMD

Volume 52 Issue 11 page: 3

Cover

Keywords

Article Metrics

Article views (1130) PDF downloads(993)

A new deep reinforcement learning model for dynamic portfolio optimization

Abstract

Graphical abstract

Abstract

Public Summary

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

A new deep reinforcement learning model for dynamic portfolio optimization

Share

Tools

Abstract

Graphical abstract

Abstract

Public Summary

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content