Boosting算法理论与应用研究

张文生; 于廷照

doi:10.3969/j.issn.0253-2778.2016.03.007

Boosting算法理论与应用研究

Research on Boosting theory and its applications

摘要

摘要: 作为机器学习领域最经典算法之一，Boosting是一种学习算法，并广泛应用于机器学习与模式识别各领域.Boosting的理论研究分为可学习理论和统计学两个角度.Boosting最初从弱可学习理论角度阐明了由弱到强的提升算法，从理论上证明了一组优于随机猜测的弱学习器通过集成可提升为在训练集上任意精度的强学习器.从统计学的角度看，Boosting是一种叠加模型，理论上二者的等价性已经证明.本文首先从可学习的角度出发，回顾了Boosting算法弱可学习理论，并提出面临的问题及挑战，包括对高维数据的有效性及Margin理论；然后阐述了Boosting算法理论研究分支，并详细回顾了当前最为流行的多种经典Boosting算法及在Boosting理论框架下的新应用；最后探讨了Boosting算法的未来研究趋势.

Abstract: Boosting is one of the most popular ensemble algorithms in machine learning, and it has been widely used in machine learning and pattern recognition. There are mainly two frameworks of Boosting, learnable theory and statistical theory. Boosting was first proposed from the theory of weak learnability which illustrates the theory of boosting a group of weak learners into a strong learner. After a finite number of iterations, the combination of weak learners could be boosted into any accuracy on the training set, and the only requirement for a weak learner is that the accuracy be slightly better than a random guess. From the statistical point of view, Boosting is an additive model, and the equivalence between these two models has already been proved. The theory of weak learnability is reviewed from the PAC perspective, and the challenges Boosting may face are presented, includeing effectiveness for high dimension data and the Margin theory. Then, various Boosting algorithms are discussed from the above two viewpoints and their new applications with Boosting framework. Finally, the future of Boosting is discussed.

HTML全文

参考文献(80)

施引文献

资源附件(0)