• 中文核心期刊要目总览
  • 中国科技核心期刊
  • 中国科学引文数据库(CSCD)
  • 中国科技论文与引文数据库(CSTPCD)
  • 中国学术期刊文摘数据库(CSAD)
  • 中国学术期刊(网络版)(CNKI)
  • 中文科技期刊数据库
  • 万方数据知识服务平台
  • 中国超星期刊域出版平台
  • 国家科技学术期刊开放平台
  • 荷兰文摘与引文数据库(SCOPUS)
  • 日本科学技术振兴机构数据库(JST)

基于0约束的稀疏线性判别分析

Sparse linear discriminant analysis via 0 constraint

  • 摘要: 研究了在高维环境下的可解释分类问题,即特征的数量p非常大,而观测的数量是有限的。这种高维情况广泛存在于生物学、工程学和社会科学等领域。线性判别分析(LDA)是解决这一可解释分类问题的典型方法。然而,在高维情况下,LDA是不适合的,原因有二。首先,组内协方差矩阵的标准估计是奇异的;因此,不能使用传统的判别规则。第二,当p很大时,由于涉及p个特征,从LDA得到的分类规则是很难解释的。在这种情况下,受最优子集选择的原始-对偶活跃集算法的启发,我们提出了一种基于0约束的稀疏线性判别分析方法,该方法在进行线性判别分析时施加了一个稀疏性标准,使分类和特征选择同时进行。在模拟和真实数据上的数值结果表明,与现有的替代方法相比,我们的方法取得了有竞争力的结果。

     

    Abstract: We consider the problem of interpretable classification in a high-dimensional setting, where the number of features is extremely large and the number of observations is limited. This setting has been extensively studied in the chemometric literature and has recently become pervasive in the biological and medical literature. Linear discriminant analysis (LDA) is a canonical approach for solving this problem. However, in the case of high dimensions, LDA is unsuitable for two reasons. First, the standard estimate of the within-class covariance matrix is singular; therefore, the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rules obtained from LDA because p features are involved. In this setting, motivated by the success of the primal-dual active set algorithm for best subset selection, we propose a method for sparse linear discriminant analysis via \ell_0 constraint, which imposes a sparsity criterion when performing linear discriminant analysis, allowing classification and feature selection to be performed simultaneously. Numerical results on synthetic and real data suggest that our method obtains competitive results compared with existing alternative methods.

     

/

返回文章
返回