ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC

A new random projection-based ensemble classifier for high-dimensional data

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2019.12.004
  • Received Date: 14 April 2019
  • Rev Recd Date: 23 May 2019
  • Publish Date: 31 December 2019
  • A decision tree ensemble method based on random projection(projection forest, PJForest) was proposed to solve the classification problem of high-dimensional data. This method used the decision tree as the base classifier and reduced the dimensionality of the data by using a series of random projections. Then based on dimensionally reduced data, a series of decision trees were constructed, and then the ensemble classifier was constructed through ensemble learning. Using appropriate random projection to reduce the dimensionality of the data can preserve the information contained in the geometric structure of the data. Moreover, perturbation of raw data through random projection can enrich the diversity of decision trees. After proper ensemble learning, it can effectively overcome the influence of noise and improve the generalization ability of PJForest. The limiting property of PJForest generalization error was proved and the convergence rate of generalization error under certain conditions was obtained. Many simulation studies were conducted and empirical studies on real life data were empirically analyzed. The simulation results showed that the method of PJForest can effectively classify high dimensional data with a large amount of noises, and has better properties than current classification methods such as random forest, Xgboost.
    A decision tree ensemble method based on random projection(projection forest, PJForest) was proposed to solve the classification problem of high-dimensional data. This method used the decision tree as the base classifier and reduced the dimensionality of the data by using a series of random projections. Then based on dimensionally reduced data, a series of decision trees were constructed, and then the ensemble classifier was constructed through ensemble learning. Using appropriate random projection to reduce the dimensionality of the data can preserve the information contained in the geometric structure of the data. Moreover, perturbation of raw data through random projection can enrich the diversity of decision trees. After proper ensemble learning, it can effectively overcome the influence of noise and improve the generalization ability of PJForest. The limiting property of PJForest generalization error was proved and the convergence rate of generalization error under certain conditions was obtained. Many simulation studies were conducted and empirical studies on real life data were empirically analyzed. The simulation results showed that the method of PJForest can effectively classify high dimensional data with a large amount of noises, and has better properties than current classification methods such as random forest, Xgboost.
  • loading
  • 加载中

Catalog

    Article Metrics

    Article views (156) PDF downloads(220)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return