ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

Research on outlier detection algorithm of XmR control chart

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2020.08.010
  • Received Date: 29 April 2020
  • Accepted Date: 06 August 2020
  • Rev Recd Date: 06 August 2020
  • Publish Date: 31 August 2020
  • A novel outlier detection algorithm was proposed based on the XmR control chart to address the complicated calculation and its time-consuming method in detecting isolated forest anomalies. By calculating the single-valued mean, its moving range and average of the sample attributes, we can draw the control limits and centerlines of the X and mR charts, and the single-valued attributes of the samples in the chart. According to the points in the X chart that exceeds the limits Sample number, add 1 to the sample number corresponding to the point that exceeds the limit in the mR graph, we take the union and delete it from the data, and then replace them after the deletion of the anomaly point with the CART. We use the random forest and support vector machine algorithm for experimental validations. The results show that this method has a faster speed and better precisions compared with the isolation forest method, which provides a new research idea for outlier detection.
    A novel outlier detection algorithm was proposed based on the XmR control chart to address the complicated calculation and its time-consuming method in detecting isolated forest anomalies. By calculating the single-valued mean, its moving range and average of the sample attributes, we can draw the control limits and centerlines of the X and mR charts, and the single-valued attributes of the samples in the chart. According to the points in the X chart that exceeds the limits Sample number, add 1 to the sample number corresponding to the point that exceeds the limit in the mR graph, we take the union and delete it from the data, and then replace them after the deletion of the anomaly point with the CART. We use the random forest and support vector machine algorithm for experimental validations. The results show that this method has a faster speed and better precisions compared with the isolation forest method, which provides a new research idea for outlier detection.
  • loading
  • [1]
    王东. 数据挖掘在检测农业补贴中欺诈行为的应用——基于异常检测与神经网络模型[J]. 平顶山学院学报, 2015, 30(05):75-78.
    [2]
    王立英. 异常点检测算法及在网络入侵检测中的应用研究[D]. 济南:山东师范大学, 2020.
    [3]
    王康, 周治平. 高斯核密度估计方法检测健康数据异常值[J]. 计算机科学与探索, 2019, 13(12): 2094-2102.
    [4]
    李普煌, 李敏, 范新南, 张学武. 迭代分析相对密度的高光谱异常检测[J]. 中国图象图形学报, 2018, 23(002):219-228.
    [5]
    沈琰辉, 刘华文, 徐晓丹, 赵建民, 陈中育. 基于邻域离散度的异常点检测算法[J]. 计算机科学与探索, 2016, 10(12): 1763-1772.
    [6]
    JYA B, SR A, PFB C. Mean-shift outlier detection and filtering[J]. Pattern Recognition, 2021, 115:101874.
    [7]
    LI X, LV J, ZHANG Y. Outlier detection using structural scores in a high-dimensional space[J]. IEEE Transactions on Cybernetics, 2020, 50(5):2302-2310.
    [8]
    RIAHI-MADVAR M, AZIRANI A A, NASERSHARIF B, et al. A new density-based subspace
    selection method using mutual information for high dimensional outlier detection[J]. Knowledge-Based Systems, 2021, 216(2):106733.
    [9]
    CABERO I, EPIFANIO I, PIROLA A, et al. Archetype analysis: A new subspace outlier detection approach[J]. Knowledge-Based Systems, 2021, 217:106830.
    [10]
    王智远, 陈榕,任崇广. 基于集成学习的云平台异常点检测[J]. 计算机工程与设计, 2020, 41(05): 1288-1294.
    [11]
    刘亚梅, 闫仁武. 一种基于密度聚类的分布式离群点检测算法[J]. 计算机与数字工程, 2019, 47(06):1320-1325.
    [12]
    CHRISTY A, GANDHI G M, VAITHYASUBRAMANIAN S. Cluster based outlier detection algorithm for healthcare data[J]. Procedia Computer Science, 2015, 50: 209-215.
    [13]
    张丹丹, 游子毅, 郑建, 陈世国. 基于改进的局部异常因子检测的优化聚类算法[J]. 微电子学与计算机, 2019, 36(11): 43-48.
    [14]
    谢兄, 唐昱. 基于局部估计密度的局部离群点检测算法[J]. 小型微型计算机系统, 2020, 41(02): 387-392.
    [15]
    梁绍一, 韩德强. 基于邻域链的数据异常点检测[J]. 控制与决策, 2019, 34(07):1433-1440.
    [16]
    BAI M, WANG X T, XIN J C, et al. An efficient algorithm for distributed density-based outlier detection on big data[J]. NeuroComputing, 2016, 181(C): 19-28.
    [17]
    胡淼, 王开军, 李海超,陈黎飞. 模糊树节点的随机森林与异常点检测[J]. 南京大学学报(自然科学), 2018, 54(06):1141-1151.
    [18]
    杨晓晖,张圣昌. 基于多粒度级联孤立森林算法的异常检测模型[J]. 通信学报, 2019, 40(08): 133-142.
    [19]
    李春生, 于澍, 刘小刚. 基于改进距离和的异常点检测算法研究[J]. 计算机技术与发展, 2019, 29(03): 97-100.
    [20]
    ANUAR M, HUSSIN A G, YONG Z Z. Statistic for Outlier Detection in Circular Functional Relationship Model[M]//Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017), 2019.
    [21]
    LIU F T, TING K M, ZHOU Z H. Isolation-Based Anomaly Detection[J]. ACM Transactions on Knowledge Discovery from Data, 2012, 6(1):1-39.
    [22]
    冀汶莉, 郗刘涛, 王斌. 面向不平衡数据集的煤矿监测系统异常数据识别方法[J]. 工矿自动化, 2020, 46(01): 18-25.
    [23]
    郭华. 质量过程控制在电信软件开发中的应用研究[D].杭州:浙江大学, 2006.
  • 加载中

Catalog

    [1]
    王东. 数据挖掘在检测农业补贴中欺诈行为的应用——基于异常检测与神经网络模型[J]. 平顶山学院学报, 2015, 30(05):75-78.
    [2]
    王立英. 异常点检测算法及在网络入侵检测中的应用研究[D]. 济南:山东师范大学, 2020.
    [3]
    王康, 周治平. 高斯核密度估计方法检测健康数据异常值[J]. 计算机科学与探索, 2019, 13(12): 2094-2102.
    [4]
    李普煌, 李敏, 范新南, 张学武. 迭代分析相对密度的高光谱异常检测[J]. 中国图象图形学报, 2018, 23(002):219-228.
    [5]
    沈琰辉, 刘华文, 徐晓丹, 赵建民, 陈中育. 基于邻域离散度的异常点检测算法[J]. 计算机科学与探索, 2016, 10(12): 1763-1772.
    [6]
    JYA B, SR A, PFB C. Mean-shift outlier detection and filtering[J]. Pattern Recognition, 2021, 115:101874.
    [7]
    LI X, LV J, ZHANG Y. Outlier detection using structural scores in a high-dimensional space[J]. IEEE Transactions on Cybernetics, 2020, 50(5):2302-2310.
    [8]
    RIAHI-MADVAR M, AZIRANI A A, NASERSHARIF B, et al. A new density-based subspace
    selection method using mutual information for high dimensional outlier detection[J]. Knowledge-Based Systems, 2021, 216(2):106733.
    [9]
    CABERO I, EPIFANIO I, PIROLA A, et al. Archetype analysis: A new subspace outlier detection approach[J]. Knowledge-Based Systems, 2021, 217:106830.
    [10]
    王智远, 陈榕,任崇广. 基于集成学习的云平台异常点检测[J]. 计算机工程与设计, 2020, 41(05): 1288-1294.
    [11]
    刘亚梅, 闫仁武. 一种基于密度聚类的分布式离群点检测算法[J]. 计算机与数字工程, 2019, 47(06):1320-1325.
    [12]
    CHRISTY A, GANDHI G M, VAITHYASUBRAMANIAN S. Cluster based outlier detection algorithm for healthcare data[J]. Procedia Computer Science, 2015, 50: 209-215.
    [13]
    张丹丹, 游子毅, 郑建, 陈世国. 基于改进的局部异常因子检测的优化聚类算法[J]. 微电子学与计算机, 2019, 36(11): 43-48.
    [14]
    谢兄, 唐昱. 基于局部估计密度的局部离群点检测算法[J]. 小型微型计算机系统, 2020, 41(02): 387-392.
    [15]
    梁绍一, 韩德强. 基于邻域链的数据异常点检测[J]. 控制与决策, 2019, 34(07):1433-1440.
    [16]
    BAI M, WANG X T, XIN J C, et al. An efficient algorithm for distributed density-based outlier detection on big data[J]. NeuroComputing, 2016, 181(C): 19-28.
    [17]
    胡淼, 王开军, 李海超,陈黎飞. 模糊树节点的随机森林与异常点检测[J]. 南京大学学报(自然科学), 2018, 54(06):1141-1151.
    [18]
    杨晓晖,张圣昌. 基于多粒度级联孤立森林算法的异常检测模型[J]. 通信学报, 2019, 40(08): 133-142.
    [19]
    李春生, 于澍, 刘小刚. 基于改进距离和的异常点检测算法研究[J]. 计算机技术与发展, 2019, 29(03): 97-100.
    [20]
    ANUAR M, HUSSIN A G, YONG Z Z. Statistic for Outlier Detection in Circular Functional Relationship Model[M]//Proceedings of the Third International Conference on Computing, Mathematics and Statistics (iCMS2017), 2019.
    [21]
    LIU F T, TING K M, ZHOU Z H. Isolation-Based Anomaly Detection[J]. ACM Transactions on Knowledge Discovery from Data, 2012, 6(1):1-39.
    [22]
    冀汶莉, 郗刘涛, 王斌. 面向不平衡数据集的煤矿监测系统异常数据识别方法[J]. 工矿自动化, 2020, 46(01): 18-25.
    [23]
    郭华. 质量过程控制在电信软件开发中的应用研究[D].杭州:浙江大学, 2006.

    Article Metrics

    Article views (120) PDF downloads(129)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return