Sparse linear discriminant analysis via <i>ℓ</i><sub>0</sub> constraint

Qi Yin; Lei Shu

doi:10.52396/JUSTC-2022-0045

PDF( 807 KB)

Open Access JUSTC Mathematics

Sparse linear discriminant analysis via ℓ₀ constraint

Qi Yin,
Lei Shu^,

Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China

Cite this:

https://doi.org/10.52396/JUSTC-2022-0045

More Information

Author Bio:
Qi Yin is currently a graduate student at the University of Science and Technology of China. His research interests focus on variable selection

Lei Shu is currently a Ph.D. student at the University of Science and Technology of China. His research interests focus on high-dimensional statistical inference, including variable selection, change point detection, and factor analysis
Corresponding author: E-mail: sl2018@mail.ustc.edu.cn
Received Date: 05 March 2022
Accepted Date: 18 April 2022

Abstract Full text PDF

Abstract

Abstract

We consider the problem of interpretable classification in a high-dimensional setting, where the number of features is extremely large and the number of observations is limited. This setting has been extensively studied in the chemometric literature and has recently become pervasive in the biological and medical literature. Linear discriminant analysis (LDA) is a canonical approach for solving this problem. However, in the case of high dimensions, LDA is unsuitable for two reasons. First, the standard estimate of the within-class covariance matrix is singular; therefore, the usual discriminant rule cannot be applied. Second, when $p$ is large, it is difficult to interpret the classification rules obtained from LDA because $p$ features are involved. In this setting, motivated by the success of the primal-dual active set algorithm for best subset selection, we propose a method for sparse linear discriminant analysis via $\ell_0$ constraint, which imposes a sparsity criterion when performing linear discriminant analysis, allowing classification and feature selection to be performed simultaneously. Numerical results on synthetic and real data suggest that our method obtains competitive results compared with existing alternative methods.

Graphical abstract

An algorithm for adding the exchange step to improve naive sparse linear discriminant analysis.

Abstract

We consider the problem of interpretable classification in a high-dimensional setting, where the number of features is extremely large and the number of observations is limited. This setting has been extensively studied in the chemometric literature and has recently become pervasive in the biological and medical literature. Linear discriminant analysis (LDA) is a canonical approach for solving this problem. However, in the case of high dimensions, LDA is unsuitable for two reasons. First, the standard estimate of the within-class covariance matrix is singular; therefore, the usual discriminant rule cannot be applied. Second, when $p$ is large, it is difficult to interpret the classification rules obtained from LDA because $p$ features are involved. In this setting, motivated by the success of the primal-dual active set algorithm for best subset selection, we propose a method for sparse linear discriminant analysis via $\ell_0$ constraint, which imposes a sparsity criterion when performing linear discriminant analysis, allowing classification and feature selection to be performed simultaneously. Numerical results on synthetic and real data suggest that our method obtains competitive results compared with existing alternative methods.

Public Summary

This paper extends LDA to a high-dimensional setting by adding an ℓ₀ penalty to produce a discriminant vector involving only a subset of features.
We propose an SLDA-BASS algorithm to directly solve the estimation of sparse LDA with ℓ₀ constraint, avoiding the unnecessary information loss caused by relaxing the constraint in the conventional algorithm.
Compared to other estimation methods, the SLDA-BASS algorithm is derived from a natural criterion and is superior in terms of sparsity recovery as well as computational efficiency.

FullText(HTML)

References(15)

References

[1]	Hastie T, Tibshirani R, Friedman J H, et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Berlin: Springer, 2009.
[2]	Shao J, Wang Y, Deng X, et al. Sparse linear discriminant analysis by thresholding for high dimensional data. The Annals of Statistics, 2011, 39 (2): 1241–1265. doi: 10.1214/10-AOS870
[3]	Bickel P, Levina E. Some theory of Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli, 2004, 10 (6): 989–1010. doi: 10.3150/bj/1106314847
[4]	Dudoit S, Fridlyand J, Speed T P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 2002, 97 (457): 77–87. doi: 10.1198/016214502753479248
[5]	Friedman J H. Regularized discriminant analysis. Journal of the American Statistical Association, 1989, 84 (405): 165–175. doi: 10.1080/01621459.1989.10478752
[6]	Xu P, Brock G N, Parrish R S. Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Computational Statistics & Data Analysis, 2009, 53 (5): 1674–1687. doi: 10.1016/j.csda.2008.02.005
[7]	Tibshirani R, Hastie T, Narasimhan B, et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99 (10): 6567–6572. doi: 10.1073/pnas.082099299
[8]	Guo Y, Hastie T, Tibshirani R. Regularized linear discriminant analysis and its application in microarrays. Biostatistics, 2007, 8 (1): 86–100. doi: 10.1093/biostatistics/kxj035
[9]	Witten D M, Tibshirani R. Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 2011, 73 (5): 753–772. doi: 10.1111/j.1467-9868.2011.00783.x
[10]	Zhu J, Wen C, Zhu J, et al. A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences of the United States of America, 2020, 117 (52): 33117–33123. doi: 10.1073/pnas.2014241117
[11]	Krzanowski W, Jonathan P, McCarthy W, et al. Discriminant analysis with singular covariance matrices: Methods and applications to spectroscopic data. Journal of the Royal Statistical Society:Series C (Applied Statistics), 1995, 44 (1): 101–115. doi: 10.2307/2986198
[12]	Tebbens J D, Schlesinger P. Improving implementation of linear discriminant analysis for the high dimension/small sample size problem. Computational Statistics & Data Analysis, 2007, 52 (1): 423–437. doi: 10.1016/j.csda.2007.02.001
[13]	Ramaswamy S, Tamayo P, Rifkin R, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98 (26): 15149–15154. doi: 10.1073/pnas.211566398
[14]	Nakayama R, Nemoto T, Takahashi H, et al. Gene expression analysis of soft tissue sarcomas: Characterization and reclassification of malignant fibrous histiocytoma. Modern Pathology, 2007, 20 (7): 749–759. doi: 10.1038/modpathol.3800794
[15]	Sun L, Hui A M, Su Q, et al. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer Cell, 2006, 9 (4): 287–300. doi: 10.1016/j.ccr.2006.03.003

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	Hastie T, Tibshirani R, Friedman J H, et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Berlin: Springer, 2009.
[2]	Shao J, Wang Y, Deng X, et al. Sparse linear discriminant analysis by thresholding for high dimensional data. The Annals of Statistics, 2011, 39 (2): 1241–1265. doi: 10.1214/10-AOS870
[3]	Bickel P, Levina E. Some theory of Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli, 2004, 10 (6): 989–1010. doi: 10.3150/bj/1106314847
[4]	Dudoit S, Fridlyand J, Speed T P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 2002, 97 (457): 77–87. doi: 10.1198/016214502753479248
[5]	Friedman J H. Regularized discriminant analysis. Journal of the American Statistical Association, 1989, 84 (405): 165–175. doi: 10.1080/01621459.1989.10478752
[6]	Xu P, Brock G N, Parrish R S. Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Computational Statistics & Data Analysis, 2009, 53 (5): 1674–1687. doi: 10.1016/j.csda.2008.02.005
[7]	Tibshirani R, Hastie T, Narasimhan B, et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99 (10): 6567–6572. doi: 10.1073/pnas.082099299
[8]	Guo Y, Hastie T, Tibshirani R. Regularized linear discriminant analysis and its application in microarrays. Biostatistics, 2007, 8 (1): 86–100. doi: 10.1093/biostatistics/kxj035
[9]	Witten D M, Tibshirani R. Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 2011, 73 (5): 753–772. doi: 10.1111/j.1467-9868.2011.00783.x
[10]	Zhu J, Wen C, Zhu J, et al. A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences of the United States of America, 2020, 117 (52): 33117–33123. doi: 10.1073/pnas.2014241117
[11]	Krzanowski W, Jonathan P, McCarthy W, et al. Discriminant analysis with singular covariance matrices: Methods and applications to spectroscopic data. Journal of the Royal Statistical Society:Series C (Applied Statistics), 1995, 44 (1): 101–115. doi: 10.2307/2986198
[12]	Tebbens J D, Schlesinger P. Improving implementation of linear discriminant analysis for the high dimension/small sample size problem. Computational Statistics & Data Analysis, 2007, 52 (1): 423–437. doi: 10.1016/j.csda.2007.02.001
[13]	Ramaswamy S, Tamayo P, Rifkin R, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98 (26): 15149–15154. doi: 10.1073/pnas.211566398
[14]	Nakayama R, Nemoto T, Takahashi H, et al. Gene expression analysis of soft tissue sarcomas: Characterization and reclassification of malignant fibrous histiocytoma. Modern Pathology, 2007, 20 (7): 749–759. doi: 10.1038/modpathol.3800794
[15]	Sun L, Hui A M, Su Q, et al. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer Cell, 2006, 9 (4): 287–300. doi: 10.1016/j.ccr.2006.03.003

TrendMD

Volume 52 Issue 8 page: 4

Cover

Keywords

Article Metrics

Article views (733) PDF downloads(1984)

Sparse linear discriminant analysis via ℓ₀ constraint

Abstract

Graphical abstract

Abstract

Public Summary

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Sparse linear discriminant analysis via ℓ0 constraint

Share

Tools

Abstract

Graphical abstract

Abstract

Public Summary

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content

Sparse linear discriminant analysis via ℓ₀ constraint