ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Mathematics 22 November 2022

Variable selection in high-dimensional extremile regression via the quasi elastic net

Cite this:
https://doi.org/10.52396/JUSTC-2022-0099
More Information
  • Author Bio:

    Yimin Xiong is currently a master’s student under the supervision of Professor Weiping Zhang at the University of Science and Technology of China. His research is focused on variable selection

    Weiping Zhang received his Ph.D. degree from the University of Science and Technology of China (USTC). He is currently a Professor at the USTC. His research interests mainly focus on longitudinal data analysis and Bayesian analysis

  • Corresponding author: E-mail: zwp@ustc.edu.cn
  • Received Date: 07 July 2022
  • Accepted Date: 08 October 2022
  • Available Online: 22 November 2022
  • Extremile regression proposed in recent years not only retains the advantage of quantile regression that can fully show the information of sample data by setting different quantiles, but also has its own superiority compared with quantile regression and expectile regression, due to its explicit expression and conservativeness in estimating. Here, we propose a linear extremile regression model and introduce a variable selection method using a penalty called a quasi elastic net (QEN) to solve high-dimensional problems. Moreover, we propose an EM algorithm and establish corresponding theoretical properties under some mild conditions. In numerical studies, we compare the QEN penalty with the $L_{0}$ , $L_{1}$ , $L_{2}$ and elastic net penalties, and the results show that the proposed method is effective and has certain advantages in analysis.
    Relationship between the MSE of estimators in QEN penalized extremile regression and samplesize n with τ = 0.5 (left) and TP and FP in different penalized extremile regressions with high-dimensional and grouped data (right).
    Extremile regression proposed in recent years not only retains the advantage of quantile regression that can fully show the information of sample data by setting different quantiles, but also has its own superiority compared with quantile regression and expectile regression, due to its explicit expression and conservativeness in estimating. Here, we propose a linear extremile regression model and introduce a variable selection method using a penalty called a quasi elastic net (QEN) to solve high-dimensional problems. Moreover, we propose an EM algorithm and establish corresponding theoretical properties under some mild conditions. In numerical studies, we compare the QEN penalty with the $L_{0}$ , $L_{1}$ , $L_{2}$ and elastic net penalties, and the results show that the proposed method is effective and has certain advantages in analysis.
    • We propose a quasi elastic net penalized linear extremile regression to deal with high-dimensional data, which leads to a sparse solution as well as being suitable for strongly collinear situations.
    • We adopt an EM algorithm to solve the L0 approximation problem efficiently, and further solve the quasi elastic net penalized optimization problem.
    • We prove that the proposed quasi elastic net penalized linear extremile regression model is effective through numerical studies.

  • loading
  • [1]
    Koenker R, Bassett G. Regression quantiles. Econometrica, 1978, 46: 33–50. doi: 10.2307/1913643
    [2]
    Newey W K, Powell J L. Asymmetric least squares estimation and testing. Econometrica, 1987, 55: 819–847. doi: 10.2307/1911031
    [3]
    Daouia A, Gijbels I, Stupfler G. Extremiles: A new perspective on asymmetric least squares. Journal of the American Statistical Association, 2019, 114 (527): 1366–1381. doi: 10.1080/01621459.2018.1498348
    [4]
    Allen D M. The relationship between variable selection and data agumentation and a method for prediction. Technometrics, 1974, 16 (1): 125–127. doi: 10.1080/00401706.1974.10489157
    [5]
    Mallows C L. Some comments on C p. Technometrics, 2000, 42 (1): 87–94. doi: 10.1080/00401706.1973.10489103
    [6]
    Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19 (6): 716–723. doi: 10.1109/TAC.1974.1100705
    [7]
    Schwarz G. Estimating the dimension of a model. The Annals of Statistics, 1978, 6 (2): 461–464. doi: 10.1214/aos/1176344136
    [8]
    Geisser S, Eddy W F. A predictive approach to model selection. Journal of the American Statistical Association, 1979, 74 (365): 153–160. doi: 10.1080/01621459.1979.10481632
    [9]
    Devroye L, Wagner T. Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 1979, 25 (5): 601–604. doi: 10.1109/TIT.1979.1056087
    [10]
    Dietterich T G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 1998, 10 (7): 1895–1923. doi: 10.1162/089976698300017197
    [11]
    Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2007, 35 (6): 2313–2351. doi: 10.1214/009053606000001523
    [12]
    Dicker L, Lin X. Parallelism, uniqueness, and large-sample asymptotics for the Dantzig selector. Canadian Journal of Statistics, 2013, 41 (1): 23–35. doi: 10.1002/cjs.11151
    [13]
    James G M, Radchenko P, Lv J. DASSO: connections between the Dantzig selector and lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, 71 (1): 127–142. doi: 10.1111/j.1467-9868.2008.00668.x
    [14]
    Antoniadis A, Fryzlewicz P, Letué F. The Dantzig selector in Cox’s proportional hazards model. Scandinavian Journal of Statistics, 2010, 37 (4): 531–552. doi: 10.1111/j.1467-9469.2009.00685.x
    [15]
    Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70 (5): 849–911. doi: 10.1111/j.1467-9868.2008.00674.x
    [16]
    Fan J, Feng Y, Song R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 2011, 106 (494): 544–557. doi: 10.1198/jasa.2011.tm09779
    [17]
    Liu Z, Lin S, Tan M. Sparse support vector machines with L p penalty for biomarker identification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2008, 7 (1): 100–107. doi: 10.1109/TCBB.2008.17
    [18]
    Mazumder R, Friedman J H, Hastie T. SparseNet: Coordinate descent with nonconvex penalties. Journal of the American Statistical Association, 2011, 106 (495): 1125–1138. doi: 10.1198/jasa.2011.tm09738
    [19]
    Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
    [20]
    Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2001, 96 (456): 1348–1360. doi: 10.1198/016214501753382273
    [21]
    Zhang C H. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 2010, 38 (2): 894–942. doi: 10.1214/09-AOS729
    [22]
    Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005, 67 (2): 301–320. doi: 10.1111/j.1467-9868.2005.00503.x
    [23]
    Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 2006, 101 (476): 1418–1429. doi: 10.1198/016214506000000735
    [24]
    Liu Z, Li G. Efficient regularized regression with penalty for variable selection and network construction. Computational and Mathematical Methods in Medicine, 2016, 2016: 3456153. doi: 10.1155/2016/3456153
    [25]
    Tihonov A N. Solution of incorrectly formulated problems and the regularization method. Soviet Math., 1963, 4: 1035–1038.
    [26]
    Wang J, Xue L, Zhu L, et al. Estimation for a partial-linear single-index model. The Annals of Statistics, 2010, 38 (1): 246–274. doi: 10.1214/09-AOS712
    [27]
    West M, Blanchette C, Dressman H, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences, 2001, 98 (20): 11462–11467. doi: 10.1073/pnas.201162998
    [28]
    Hastie T, Tibshirani R, Eisen M B, et al. ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 2000, 1: research0003.1. doi: 10.1186/gb-2000-1-2-research0003
    [29]
    Hastie T, Tibshirani R, Botstein D, et al. Supervised harvesting of expression trees. Genome Biology, 2001, 2: research0003.1. doi: 10.1186/gb-2001-2-1-research0003
    [30]
    Segal M S, Dahlquist K D, Conklin B R. Regression approaches for microarray data analysis. Journal of Computational Biology, 2003, 10 (6): 961–980. doi: 10.1089/106652703322756177
    [31]
    Redmond M, Baveja A. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, 2002, 141 (3): 660–678. doi: 10.1016/S0377-2217(01)00264-8
  • 加载中

Catalog

    [1]
    Koenker R, Bassett G. Regression quantiles. Econometrica, 1978, 46: 33–50. doi: 10.2307/1913643
    [2]
    Newey W K, Powell J L. Asymmetric least squares estimation and testing. Econometrica, 1987, 55: 819–847. doi: 10.2307/1911031
    [3]
    Daouia A, Gijbels I, Stupfler G. Extremiles: A new perspective on asymmetric least squares. Journal of the American Statistical Association, 2019, 114 (527): 1366–1381. doi: 10.1080/01621459.2018.1498348
    [4]
    Allen D M. The relationship between variable selection and data agumentation and a method for prediction. Technometrics, 1974, 16 (1): 125–127. doi: 10.1080/00401706.1974.10489157
    [5]
    Mallows C L. Some comments on C p. Technometrics, 2000, 42 (1): 87–94. doi: 10.1080/00401706.1973.10489103
    [6]
    Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19 (6): 716–723. doi: 10.1109/TAC.1974.1100705
    [7]
    Schwarz G. Estimating the dimension of a model. The Annals of Statistics, 1978, 6 (2): 461–464. doi: 10.1214/aos/1176344136
    [8]
    Geisser S, Eddy W F. A predictive approach to model selection. Journal of the American Statistical Association, 1979, 74 (365): 153–160. doi: 10.1080/01621459.1979.10481632
    [9]
    Devroye L, Wagner T. Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 1979, 25 (5): 601–604. doi: 10.1109/TIT.1979.1056087
    [10]
    Dietterich T G. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 1998, 10 (7): 1895–1923. doi: 10.1162/089976698300017197
    [11]
    Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2007, 35 (6): 2313–2351. doi: 10.1214/009053606000001523
    [12]
    Dicker L, Lin X. Parallelism, uniqueness, and large-sample asymptotics for the Dantzig selector. Canadian Journal of Statistics, 2013, 41 (1): 23–35. doi: 10.1002/cjs.11151
    [13]
    James G M, Radchenko P, Lv J. DASSO: connections between the Dantzig selector and lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, 71 (1): 127–142. doi: 10.1111/j.1467-9868.2008.00668.x
    [14]
    Antoniadis A, Fryzlewicz P, Letué F. The Dantzig selector in Cox’s proportional hazards model. Scandinavian Journal of Statistics, 2010, 37 (4): 531–552. doi: 10.1111/j.1467-9469.2009.00685.x
    [15]
    Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70 (5): 849–911. doi: 10.1111/j.1467-9868.2008.00674.x
    [16]
    Fan J, Feng Y, Song R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. Journal of the American Statistical Association, 2011, 106 (494): 544–557. doi: 10.1198/jasa.2011.tm09779
    [17]
    Liu Z, Lin S, Tan M. Sparse support vector machines with L p penalty for biomarker identification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2008, 7 (1): 100–107. doi: 10.1109/TCBB.2008.17
    [18]
    Mazumder R, Friedman J H, Hastie T. SparseNet: Coordinate descent with nonconvex penalties. Journal of the American Statistical Association, 2011, 106 (495): 1125–1138. doi: 10.1198/jasa.2011.tm09738
    [19]
    Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
    [20]
    Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2001, 96 (456): 1348–1360. doi: 10.1198/016214501753382273
    [21]
    Zhang C H. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 2010, 38 (2): 894–942. doi: 10.1214/09-AOS729
    [22]
    Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005, 67 (2): 301–320. doi: 10.1111/j.1467-9868.2005.00503.x
    [23]
    Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 2006, 101 (476): 1418–1429. doi: 10.1198/016214506000000735
    [24]
    Liu Z, Li G. Efficient regularized regression with penalty for variable selection and network construction. Computational and Mathematical Methods in Medicine, 2016, 2016: 3456153. doi: 10.1155/2016/3456153
    [25]
    Tihonov A N. Solution of incorrectly formulated problems and the regularization method. Soviet Math., 1963, 4: 1035–1038.
    [26]
    Wang J, Xue L, Zhu L, et al. Estimation for a partial-linear single-index model. The Annals of Statistics, 2010, 38 (1): 246–274. doi: 10.1214/09-AOS712
    [27]
    West M, Blanchette C, Dressman H, et al. Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences, 2001, 98 (20): 11462–11467. doi: 10.1073/pnas.201162998
    [28]
    Hastie T, Tibshirani R, Eisen M B, et al. ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology, 2000, 1: research0003.1. doi: 10.1186/gb-2000-1-2-research0003
    [29]
    Hastie T, Tibshirani R, Botstein D, et al. Supervised harvesting of expression trees. Genome Biology, 2001, 2: research0003.1. doi: 10.1186/gb-2001-2-1-research0003
    [30]
    Segal M S, Dahlquist K D, Conklin B R. Regression approaches for microarray data analysis. Journal of Computational Biology, 2003, 10 (6): 961–980. doi: 10.1089/106652703322756177
    [31]
    Redmond M, Baveja A. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, 2002, 141 (3): 660–678. doi: 10.1016/S0377-2217(01)00264-8

    Article Metrics

    Article views (719) PDF downloads(1920)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return