We consider the problem of interpretable classification in a high-dimensional setting, where the number of features is extremely large and the number of observations is limited. This setting has been extensively studied in the chemometric literature and has recently become pervasive in the biological and medical literature. Linear discriminant analysis (LDA) is a canonical approach for solving this problem. However, in the case of high dimensions, LDA is unsuitable for two reasons. First, the standard estimate of the within-class covariance matrix is singular; therefore, the usual discriminant rule cannot be applied. Second, when
p is large, it is difficult to interpret the classification rules obtained from LDA because
p features are involved. In this setting, motivated by the success of the primal-dual active set algorithm for best subset selection, we propose a method for sparse linear discriminant analysis via
\ell_0 constraint, which imposes a sparsity criterion when performing linear discriminant analysis, allowing classification and feature selection to be performed simultaneously. Numerical results on synthetic and real data suggest that our method obtains competitive results compared with existing alternative methods.