Variable selection in high-dimensional extremile regression via the quasi elastic net

Yimin Xiong; Zhi Zheng; Weiping Zhang

doi:10.52396/JUSTC-2022-0099

PDF( 1026 KB)

Open Access JUSTC Mathematics 22 November 2022

Variable selection in high-dimensional extremile regression via the quasi elastic net

Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China

Cite this:

https://doi.org/10.52396/JUSTC-2022-0099

More Information

Author Bio:
Yimin Xiong is currently a master’s student under the supervision of Professor Weiping Zhang at the University of Science and Technology of China. His research is focused on variable selection

Weiping Zhang received his Ph.D. degree from the University of Science and Technology of China (USTC). He is currently a Professor at the USTC. His research interests mainly focus on longitudinal data analysis and Bayesian analysis
Corresponding author: E-mail: zwp@ustc.edu.cn
Received Date: 07 July 2022
Accepted Date: 08 October 2022

Available Online: 22 November 2022

Abstract Full text PDF

Abstract

Abstract

Extremile regression proposed in recent years not only retains the advantage of quantile regression that can fully show the information of sample data by setting different quantiles, but also has its own superiority compared with quantile regression and expectile regression, due to its explicit expression and conservativeness in estimating. Here, we propose a linear extremile regression model and introduce a variable selection method using a penalty called a quasi elastic net (QEN) to solve high-dimensional problems. Moreover, we propose an EM algorithm and establish corresponding theoretical properties under some mild conditions. In numerical studies, we compare the QEN penalty with the $L_{0}$ , $L_{1}$ , $L_{2}$ and elastic net penalties, and the results show that the proposed method is effective and has certain advantages in analysis.

Graphical abstract

Relationship between the MSE of estimators in QEN penalized extremile regression and samplesize n with τ = 0.5 (left) and TP and FP in different penalized extremile regressions with high-dimensional and grouped data (right).

Abstract

Extremile regression proposed in recent years not only retains the advantage of quantile regression that can fully show the information of sample data by setting different quantiles, but also has its own superiority compared with quantile regression and expectile regression, due to its explicit expression and conservativeness in estimating. Here, we propose a linear extremile regression model and introduce a variable selection method using a penalty called a quasi elastic net (QEN) to solve high-dimensional problems. Moreover, we propose an EM algorithm and establish corresponding theoretical properties under some mild conditions. In numerical studies, we compare the QEN penalty with the $L_{0}$ , $L_{1}$ , $L_{2}$ and elastic net penalties, and the results show that the proposed method is effective and has certain advantages in analysis.

Public Summary

We propose a quasi elastic net penalized linear extremile regression to deal with high-dimensional data, which leads to a sparse solution as well as being suitable for strongly collinear situations.
We adopt an EM algorithm to solve the L₀ approximation problem efficiently, and further solve the quasi elastic net penalized optimization problem.
We prove that the proposed quasi elastic net penalized linear extremile regression model is effective through numerical studies.

FullText(HTML)

References(31)

References

[1]	Koenker R, Bassett G. Regression quantiles. Econometrica, 1978, 46: 33–50. doi: 10.2307/1913643
[2]	Newey W K, Powell J L. Asymmetric least squares estimation and testing. Econometrica, 1987, 55: 819–847. doi: 10.2307/1911031
[3]	Daouia A, Gijbels I, Stupfler G. Extremiles: A new perspective on asymmetric least squares. Journal of the American Statistical Association, 2019, 114 (527): 1366–1381. doi: 10.1080/01621459.2018.1498348
[4]	Allen D M. The relationship between variable selection and data agumentation and a method for prediction. Technometrics, 1974, 16 (1): 125–127. doi: 10.1080/00401706.1974.10489157
[5]	Mallows C L. Some comments on C _p. Technometrics, 2000, 42 (1): 87–94. doi: 10.1080/00401706.1973.10489103
[6]	Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19 (6): 716–723. doi: 10.1109/TAC.1974.1100705
[7]	Schwarz G. Estimating the dimension of a model. The Annals of Statistics, 1978, 6 (2): 461–464. doi: 10.1214/aos/1176344136
[8]	Geisser S, Eddy W F. A predictive approach to model selection. Journal of the American Statistical Association, 1979, 74 (365): 153–160. doi: 10.1080/01621459.1979.10481632
[9]	Devroye L, Wagner T. Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 1979, 25 (5): 601–604. doi: 10.1109/TIT.1979.1056087
[10]	Dietterich T G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 1998, 10 (7): 1895–1923. doi: 10.1162/089976698300017197
[11]	Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2007, 35 (6): 2313–2351. doi: 10.1214/009053606000001523
[12]	Dicker L, Lin X. Parallelism, uniqueness, and large-sample asymptotics for the Dantzig selector. Canadian Journal of Statistics, 2013, 41 (1): 23–35. doi: 10.1002/cjs.11151
[13]	James G M, Radchenko P, Lv J. DASSO: connections between the Dantzig selector and lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, 71 (1): 127–142. doi: 10.1111/j.1467-9868.2008.00668.x
[14]	Antoniadis A, Fryzlewicz P, Letué F. The Dantzig selector in Cox’s proportional hazards model. Scandinavian Journal of Statistics, 2010, 37 (4): 531–552. doi: 10.1111/j.1467-9469.2009.00685.x
[15]	Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70 (5): 849–911. doi: 10.1111/j.1467-9868.2008.00674.x
[16]	Fan J, Feng Y, Song R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 2011, 106 (494): 544–557. doi: 10.1198/jasa.2011.tm09779
[17]	Liu Z, Lin S, Tan M. Sparse support vector machines with L _p penalty for biomarker identification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2008, 7 (1): 100–107. doi: 10.1109/TCBB.2008.17
[18]	Mazumder R, Friedman J H, Hastie T. SparseNet: Coordinate descent with nonconvex penalties. Journal of the American Statistical Association, 2011, 106 (495): 1125–1138. doi: 10.1198/jasa.2011.tm09738
[19]	Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
[20]	Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2001, 96 (456): 1348–1360. doi: 10.1198/016214501753382273
[21]	Zhang C H. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 2010, 38 (2): 894–942. doi: 10.1214/09-AOS729
[22]	Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005, 67 (2): 301–320. doi: 10.1111/j.1467-9868.2005.00503.x
[23]	Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 2006, 101 (476): 1418–1429. doi: 10.1198/016214506000000735
[24]	Liu Z, Li G. Efficient regularized regression with penalty for variable selection and network construction. Computational and Mathematical Methods in Medicine, 2016, 2016: 3456153. doi: 10.1155/2016/3456153
[25]	Tihonov A N. Solution of incorrectly formulated problems and the regularization method. Soviet Math., 1963, 4: 1035–1038.
[26]	Wang J, Xue L, Zhu L, et al. Estimation for a partial-linear single-index model. The Annals of Statistics, 2010, 38 (1): 246–274. doi: 10.1214/09-AOS712
[27]	West M, Blanchette C, Dressman H, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences, 2001, 98 (20): 11462–11467. doi: 10.1073/pnas.201162998
[28]	Hastie T, Tibshirani R, Eisen M B, et al. ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 2000, 1: research0003.1. doi: 10.1186/gb-2000-1-2-research0003
[29]	Hastie T, Tibshirani R, Botstein D, et al. Supervised harvesting of expression trees. Genome Biology, 2001, 2: research0003.1. doi: 10.1186/gb-2001-2-1-research0003
[30]	Segal M S, Dahlquist K D, Conklin B R. Regression approaches for microarray data analysis. Journal of Computational Biology, 2003, 10 (6): 961–980. doi: 10.1089/106652703322756177
[31]	Redmond M, Baveja A. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, 2002, 141 (3): 660–678. doi: 10.1016/S0377-2217(01)00264-8

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	Koenker R, Bassett G. Regression quantiles. Econometrica, 1978, 46: 33–50. doi: 10.2307/1913643
[2]	Newey W K, Powell J L. Asymmetric least squares estimation and testing. Econometrica, 1987, 55: 819–847. doi: 10.2307/1911031
[3]	Daouia A, Gijbels I, Stupfler G. Extremiles: A new perspective on asymmetric least squares. Journal of the American Statistical Association, 2019, 114 (527): 1366–1381. doi: 10.1080/01621459.2018.1498348
[4]	Allen D M. The relationship between variable selection and data agumentation and a method for prediction. Technometrics, 1974, 16 (1): 125–127. doi: 10.1080/00401706.1974.10489157
[5]	Mallows C L. Some comments on C _p. Technometrics, 2000, 42 (1): 87–94. doi: 10.1080/00401706.1973.10489103
[6]	Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19 (6): 716–723. doi: 10.1109/TAC.1974.1100705
[7]	Schwarz G. Estimating the dimension of a model. The Annals of Statistics, 1978, 6 (2): 461–464. doi: 10.1214/aos/1176344136
[8]	Geisser S, Eddy W F. A predictive approach to model selection. Journal of the American Statistical Association, 1979, 74 (365): 153–160. doi: 10.1080/01621459.1979.10481632
[9]	Devroye L, Wagner T. Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 1979, 25 (5): 601–604. doi: 10.1109/TIT.1979.1056087
[10]	Dietterich T G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 1998, 10 (7): 1895–1923. doi: 10.1162/089976698300017197
[11]	Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2007, 35 (6): 2313–2351. doi: 10.1214/009053606000001523
[12]	Dicker L, Lin X. Parallelism, uniqueness, and large-sample asymptotics for the Dantzig selector. Canadian Journal of Statistics, 2013, 41 (1): 23–35. doi: 10.1002/cjs.11151
[13]	James G M, Radchenko P, Lv J. DASSO: connections between the Dantzig selector and lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, 71 (1): 127–142. doi: 10.1111/j.1467-9868.2008.00668.x
[14]	Antoniadis A, Fryzlewicz P, Letué F. The Dantzig selector in Cox’s proportional hazards model. Scandinavian Journal of Statistics, 2010, 37 (4): 531–552. doi: 10.1111/j.1467-9469.2009.00685.x
[15]	Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70 (5): 849–911. doi: 10.1111/j.1467-9868.2008.00674.x
[16]	Fan J, Feng Y, Song R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 2011, 106 (494): 544–557. doi: 10.1198/jasa.2011.tm09779
[17]	Liu Z, Lin S, Tan M. Sparse support vector machines with L _p penalty for biomarker identification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2008, 7 (1): 100–107. doi: 10.1109/TCBB.2008.17
[18]	Mazumder R, Friedman J H, Hastie T. SparseNet: Coordinate descent with nonconvex penalties. Journal of the American Statistical Association, 2011, 106 (495): 1125–1138. doi: 10.1198/jasa.2011.tm09738
[19]	Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
[20]	Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2001, 96 (456): 1348–1360. doi: 10.1198/016214501753382273
[21]	Zhang C H. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 2010, 38 (2): 894–942. doi: 10.1214/09-AOS729
[22]	Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005, 67 (2): 301–320. doi: 10.1111/j.1467-9868.2005.00503.x
[23]	Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 2006, 101 (476): 1418–1429. doi: 10.1198/016214506000000735
[24]	Liu Z, Li G. Efficient regularized regression with penalty for variable selection and network construction. Computational and Mathematical Methods in Medicine, 2016, 2016: 3456153. doi: 10.1155/2016/3456153
[25]	Tihonov A N. Solution of incorrectly formulated problems and the regularization method. Soviet Math., 1963, 4: 1035–1038.
[26]	Wang J, Xue L, Zhu L, et al. Estimation for a partial-linear single-index model. The Annals of Statistics, 2010, 38 (1): 246–274. doi: 10.1214/09-AOS712
[27]	West M, Blanchette C, Dressman H, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences, 2001, 98 (20): 11462–11467. doi: 10.1073/pnas.201162998
[28]	Hastie T, Tibshirani R, Eisen M B, et al. ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 2000, 1: research0003.1. doi: 10.1186/gb-2000-1-2-research0003
[29]	Hastie T, Tibshirani R, Botstein D, et al. Supervised harvesting of expression trees. Genome Biology, 2001, 2: research0003.1. doi: 10.1186/gb-2001-2-1-research0003
[30]	Segal M S, Dahlquist K D, Conklin B R. Regression approaches for microarray data analysis. Journal of Computational Biology, 2003, 10 (6): 961–980. doi: 10.1089/106652703322756177
[31]	Redmond M, Baveja A. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, 2002, 141 (3): 660–678. doi: 10.1016/S0377-2217(01)00264-8

TrendMD

Volume 53 Issue 2 page: 1

Cover

Keywords

Article Metrics

Article views (696) PDF downloads(1868)

Variable selection in high-dimensional extremile regression via the quasi elastic net

Abstract

Graphical abstract

Abstract

Public Summary

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Variable selection in high-dimensional extremile regression via the quasi elastic net

Share

Tools

Abstract

Graphical abstract

Abstract

Public Summary

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content