ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Research Articles: Mathematics

Inference of online updating approach to nonparametric smoothing of big data

Cite this:
https://doi.org/10.52396/JUST-2021-0078
  • Received Date: 17 March 2021
  • Rev Recd Date: 20 May 2021
  • Publish Date: 31 May 2021
  • The online updating method (ONLINE) is an efficient analysis approach applied to big data. We prove the asymptotic properties and conduct statistical inference of the ONLINE models in kernel density and kernel regression. Several algorithms are proposed to solve the problems of the bandwidth selection in kernel density and regression respectively. We verify the asymptotic normality of the ONLINE density model in simulation and apply the ONLINE linear kernel regression to the Volatility Index (VIX) prediction. The empirical results show that the ONLINE linear kernel regression model achieves a comparable performance in continuously arriving option data streams prediction with significantly lower complexity than the classical local linear regression model.
    The online updating method (ONLINE) is an efficient analysis approach applied to big data. We prove the asymptotic properties and conduct statistical inference of the ONLINE models in kernel density and kernel regression. Several algorithms are proposed to solve the problems of the bandwidth selection in kernel density and regression respectively. We verify the asymptotic normality of the ONLINE density model in simulation and apply the ONLINE linear kernel regression to the Volatility Index (VIX) prediction. The empirical results show that the ONLINE linear kernel regression model achieves a comparable performance in continuously arriving option data streams prediction with significantly lower complexity than the classical local linear regression model.
  • loading
  • [1]
    Aggarwal C C. Data Streams:Models and Algorithms. Heidelberg: Springer Science & Business Media, 2007.
    [2]
    Aggarwal C C, Yu P S. A Survey of Synopsis Construction in Data Streams. Heidelberg: Springer Science & Business Media, 2007.
    [3]
    Wrench C, Stahl F, Fatta G, et al. Data stream mining of event and complex event streams: A survey of existing and future technologies and applications in big data. In: Enterprise Big Data Engineering, Analytics, and Management. Hershey, PA: IGI Global, 2016.
    [4]
    Cao Y, He H, Man H. SOMKE: Kernel density estimation over data streams by sequences of self-organizing maps. IEEE Transactions on Neural Networks Learning Systems, 2012, 23(8): 1254-1268.
    [5]
    Min X, Ishibuchi H, Xin G, et al. Dm-KDE: Dynamical kernel density estimation by sequences of KDE estimators with fixed number of components over data streams. Frontiers of Computer Science, 2014, 8(4): 563-580.
    [6]
    Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. Boca Raton, FL: CRC Press, 1996.
    [7]
    Härdle W. Applied Nonparametric Regression. Cambridge, UK: Cambridge University Press, 1990.
    [8]
    Härdle W, Linton O. Nonparametric regression. SFB Discussion Papers, 1995, 3(2): 867-877.
    [9]
    Elizabeth S D, Wu J, Wang C, et al. Online updating of statistical inference in the big data setting. Technometrics, 2016, 58(3): 393-403.
    [10]
    Han B, Comaniciu D, Zhu Y, et al. Sequential kernel density approximation and its application to real-time visual tracking. IEEE Transactions on Pattern Analysis Machine Intelligence, 2008, 30(7): 1186-1197.
    [11]
    Spurek P, Byrski K, Tabor J. Online updating of active function cross-entropy clustering. Pattern Analysis and Applications, 2019, 22(4): 1409-1425.
    [12]
    Kong E, Xia Y. On the efficiency of online approach to nonparametric smoothing of big data. Statistica Sinica, 2019, 29(1): 185-201.
    [13]
    Deheuvels P. Estimation nonparamétrique de la densité par histogrammes généralisés. Revue de Statistique Appliquée, 1977, 25(3): 5-42.
    [14]
    Stone M. Cross-validation and multinomial prediction. Biometrika, 1974, 61(3): 509-515.
    [15]
    Cao R, Cuevas A, Manteiga W G. A comparative study of several smoothing methods in density estimation. Computational Statistics Data Analysis, 1994, 17(2): 153-176.
    [16]
    Woodroofe M. On choosing a delta-sequence. Annals of Mathematical Statistics, 1970, 41(5): 1665-1671.
    [17]
    Scott D W, Tapia R A, Thompson J R. Kernel density estimation revisited. Nonlinear Analysis Theory Methods Applications, 1977, 1(4): 339-372.
    [18]
    Cwik J, Koronacki J. A combined adaptive-mixtures/plug-in estimator of multivariate probability densities. Computational Statistics & Data Analysis, 1997, 26(2): 199-218.
    [19]
    Vincent P, Bengio Y. Non-local manifold Parzen windows. In: Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press, 2003: 849-856.
    [20]
    Whaley R E. Understanding the VIX. The Journal of Portfolio Management, 2009, 35(3): 98-105.
    [21]
    Jiang G J, Tian Y S. Extracting model-free volatility from option prices: An examination of the VIX index. The Journal of Derivatives, 2007, 14(3): 35-60.
  • 加载中

Catalog

    [1]
    Aggarwal C C. Data Streams:Models and Algorithms. Heidelberg: Springer Science & Business Media, 2007.
    [2]
    Aggarwal C C, Yu P S. A Survey of Synopsis Construction in Data Streams. Heidelberg: Springer Science & Business Media, 2007.
    [3]
    Wrench C, Stahl F, Fatta G, et al. Data stream mining of event and complex event streams: A survey of existing and future technologies and applications in big data. In: Enterprise Big Data Engineering, Analytics, and Management. Hershey, PA: IGI Global, 2016.
    [4]
    Cao Y, He H, Man H. SOMKE: Kernel density estimation over data streams by sequences of self-organizing maps. IEEE Transactions on Neural Networks Learning Systems, 2012, 23(8): 1254-1268.
    [5]
    Min X, Ishibuchi H, Xin G, et al. Dm-KDE: Dynamical kernel density estimation by sequences of KDE estimators with fixed number of components over data streams. Frontiers of Computer Science, 2014, 8(4): 563-580.
    [6]
    Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. Boca Raton, FL: CRC Press, 1996.
    [7]
    Härdle W. Applied Nonparametric Regression. Cambridge, UK: Cambridge University Press, 1990.
    [8]
    Härdle W, Linton O. Nonparametric regression. SFB Discussion Papers, 1995, 3(2): 867-877.
    [9]
    Elizabeth S D, Wu J, Wang C, et al. Online updating of statistical inference in the big data setting. Technometrics, 2016, 58(3): 393-403.
    [10]
    Han B, Comaniciu D, Zhu Y, et al. Sequential kernel density approximation and its application to real-time visual tracking. IEEE Transactions on Pattern Analysis Machine Intelligence, 2008, 30(7): 1186-1197.
    [11]
    Spurek P, Byrski K, Tabor J. Online updating of active function cross-entropy clustering. Pattern Analysis and Applications, 2019, 22(4): 1409-1425.
    [12]
    Kong E, Xia Y. On the efficiency of online approach to nonparametric smoothing of big data. Statistica Sinica, 2019, 29(1): 185-201.
    [13]
    Deheuvels P. Estimation nonparamétrique de la densité par histogrammes généralisés. Revue de Statistique Appliquée, 1977, 25(3): 5-42.
    [14]
    Stone M. Cross-validation and multinomial prediction. Biometrika, 1974, 61(3): 509-515.
    [15]
    Cao R, Cuevas A, Manteiga W G. A comparative study of several smoothing methods in density estimation. Computational Statistics Data Analysis, 1994, 17(2): 153-176.
    [16]
    Woodroofe M. On choosing a delta-sequence. Annals of Mathematical Statistics, 1970, 41(5): 1665-1671.
    [17]
    Scott D W, Tapia R A, Thompson J R. Kernel density estimation revisited. Nonlinear Analysis Theory Methods Applications, 1977, 1(4): 339-372.
    [18]
    Cwik J, Koronacki J. A combined adaptive-mixtures/plug-in estimator of multivariate probability densities. Computational Statistics & Data Analysis, 1997, 26(2): 199-218.
    [19]
    Vincent P, Bengio Y. Non-local manifold Parzen windows. In: Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press, 2003: 849-856.
    [20]
    Whaley R E. Understanding the VIX. The Journal of Portfolio Management, 2009, 35(3): 98-105.
    [21]
    Jiang G J, Tian Y S. Extracting model-free volatility from option prices: An examination of the VIX index. The Journal of Derivatives, 2007, 14(3): 35-60.

    Article Metrics

    Article views (154) PDF downloads(1259)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return