ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Mathematics 15 January 2024

Gaussian graphical model estimation with measurement error

Cite this:
https://doi.org/10.52396/JUSTC-2022-0108
More Information
  • Author Bio:

    Xianglu Wang is currently a master student at the School of Management, University of Science and Technology of China (USTC). He received his B.S. degree from USTC in 2019. His research mainly focuses on high-dimensional variable selection and inference

  • Corresponding author: E-mail: wz124517@mail.ustc.edu.cn
  • Received Date: 28 July 2022
  • Accepted Date: 23 October 2022
  • Available Online: 15 January 2024
  • It is well known that regression methods designed for clean data will lead to erroneous results if directly applied to corrupted data. Despite the recent methodological and algorithmic advances in Gaussian graphical model estimation, how to achieve efficient and scalable estimation under contaminated covariates is unclear. Here a new methodology called convex conditioned innovative scalable efficient estimation (COCOISEE) for Gaussian graphical models under both additive and multiplicative measurement errors is developed. It combines the strengths of the innovative scalable efficient estimation in the Gaussian graphical model and the nearest positive semidefinite matrix projection, thus enjoying stepwise convexity and scalability. Comprehensive theoretical guarantees are provided and the effectiveness of the proposed methodology is demonstrated through numerical studies.
    The process of recovering the precision matrix in the presence of additive and multiplicative measurement errors.
    It is well known that regression methods designed for clean data will lead to erroneous results if directly applied to corrupted data. Despite the recent methodological and algorithmic advances in Gaussian graphical model estimation, how to achieve efficient and scalable estimation under contaminated covariates is unclear. Here a new methodology called convex conditioned innovative scalable efficient estimation (COCOISEE) for Gaussian graphical models under both additive and multiplicative measurement errors is developed. It combines the strengths of the innovative scalable efficient estimation in the Gaussian graphical model and the nearest positive semidefinite matrix projection, thus enjoying stepwise convexity and scalability. Comprehensive theoretical guarantees are provided and the effectiveness of the proposed methodology is demonstrated through numerical studies.
    • We propose a new methodology COCOISEE to achieve scalable and interpretable estimation for Gaussian graphical model under both additive and multiplicative measurement errors.
    • The method is stepwise convex, computationally stable, efficient and scalable.
    • Both theoretical and simulation results verify the feasibility of our method.

  • loading
  • [1]
    Baselmans B M, Jansen R, Ip H F, et al. Multivariate genome-wide analyses of the well-being spectrum. Nature Genetics, 2019, 51 (3): 445–451. doi: 10.1038/s41588-018-0320-8
    [2]
    Yang K, Lee L F. Identification and QML estimation of multivariate and simultaneous equations spatial autoregressive models. Journal of Econometrics, 2017, 196 (1): 196–214. doi: 10.1016/j.jeconom.2016.04.019
    [3]
    Zhu X, Huang D, Pan R, et al. Multivariate spatial autoregressive model for large scale social networks. Journal of Econometrics, 2020, 215 (2): 591–606. doi: 10.1016/j.jeconom.2018.11.018
    [4]
    Han F, Liu H. Optimal rates of convergence for latent generalized correlation matrix estimation in transelliptical distribution. arXiv: 1305.6916, 2013.
    [5]
    Rubinstein M. Markowitz’s “portfolio selection”: A fifty-year retrospective. The Journal of Finance, 2002, 57 (3): 1041–1045. doi: 10.1111/1540-6261.00453
    [6]
    Wegkamp M, Zhao Y. Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas. Bernoulli, 2016, 22 (2): 1184–1226. doi: 10.3150/14-BEJ690
    [7]
    Fan J, Han F, Liu H. Challenges of big data analysis. National Science Review, 2014, 1 (2): 293–314. doi: 10.1093/nsr/nwt032
    [8]
    Cai T, Liu W, Luo X. A constrained 1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 2011, 106 (494): 594–607. doi: 10.1198/jasa.2011.tm10155
    [9]
    Tokuda T, Goodrich B, van Mechelen I, et al. Visualizing distributions of covariance matrices. New York: Columbia University, 2011.
    [10]
    Fan J, Peng H. Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 2004, 32 (3): 928–961. doi: 10.1214/009053604000000256
    [11]
    Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika, 2007, 94 (1): 19–35. doi: 10.1093/biomet/asm018
    [12]
    Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical Lasso. Biostatistics, 2008, 9 (3): 432–441. doi: 10.1093/biostatistics/kxm045
    [13]
    Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. The Journal of Machine Learning Research, 2008, 9: 485–516. doi: 10.5555/1390681.1390696
    [14]
    Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 2006, 34 (3): 1436–1462. doi: 10.1214/009053606000000281
    [15]
    Wille A, Zimmermann P, Vranova E, et al. Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology, 2004, 5 (11): R92. doi: 10.1186/gb-2004-5-11-r92
    [16]
    Rothman A J, Bickel P J, Levina E, et al. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2008, 2: 494–515. doi: 10.1214/08-EJS176
    [17]
    Lam C, Fan J. Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics, 2009, 37 (6B): 4254–4278. doi: 10.1214/09-AOS720
    [18]
    Yuan M. High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 2010, 11: 2261–2286. doi: 10.5555/1756006.1859930
    [19]
    Liu W, Luo X. High-dimensional sparse precision matrix estimation via sparse column inverse operator. arXiv: 1203.3896, 2012.
    [20]
    Sun T, Zhang C H. Sparse matrix inversion with scaled Lasso. The Journal of Machine Learning Research, 2013, 14 (1): 3385–3418. doi: 10.5555/2567709.2567771
    [21]
    Fan Y, Lv J. Innovated scalable efficient estimation in ultra-large Gaussian graphical models. The Annals of Statistics, 2016, 44 (5): 2098–2126. doi: 10.1214/15-AOS1416
    [22]
    Bickel P, Ritov Y. Efficient estimation in the errors in variables model. The Annals of Statistics, 1987, 15 (2): 513–540. doi: 10.1214/aos/1176350358
    [23]
    Ma Y, Li R. Variable selection in measurement error models. Bernoulli, 2010, 16 (1): 274–300. doi: 10.3150/09-bej205
    [24]
    Liang H, Li R. Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association, 2009, 104 (485): 234–248. doi: 10.1198/jasa.2009.0127
    [25]
    Städler N, Bühlmann P. Missing values: Sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2012, 22 (1): 219–235. doi: 10.1007/s11222-010-9219-7
    [26]
    Loh P L, Wainwright M J. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Advances in Neural Information Processing Systems, 2012, 40 (3): 1637–1664. doi: 10.1214/12-AOS1018
    [27]
    Belloni A, Rosenbaum M, Tsybakov A B. Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2017, 79 (3): 939–956. doi: 10.1111/rssb.12196
    [28]
    Datta A, Zou H. Cocolasso for high-dimensional error-in-variables regression. The Annals of Statistics, 2017, 45 (6): 2400–2426. doi: 10.1214/16-AOS1527
    [29]
    Tibshirani R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
    [30]
    Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2001, 96 (456): 1348–1360. doi: 10.1198/016214501753382273
    [31]
    Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005, 67 (2): 301–320. doi: 10.1111/j.1467-9868.2005.00503.x
    [32]
    Zou H. The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 2006, 101 (476): 1418–1429. doi: 10.1198/016214506000000735
    [33]
    Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2007, 35 (6): 2313–2351. doi: 10.1214/009053606000001523
    [34]
    Bickel P J, Ritov Y, Tsybakov A B. Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 2009, 37 (4): 1705–1732. doi: 10.1214/08-AOS620
    [35]
    Zhao P, Yu B. On model selection consistency of Lasso. The Journal of Machine Learning Research, 2006, 7: 2541–2563. doi: 10.5555/1248547.1248637
    [36]
    Wainwright M J. Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso). IEEE Transactions on Information Theory, 2009, 55 (5): 2183–2202. doi: 10.1109/TIT.2009.2016018
    [37]
    Buldygin V V, Kozachenko Yu V. Metric Characterization of Random Variables and Random Processes. Providence, RI: American Mathematical Society, 2000.
    [38]
    Sun T, Zhang C H. Scaled sparse linear regression. Biometrika, 2012, 99 (4): 879–898. doi: 10.1093/biomet/ass043
    [39]
    Ren Z, Sun T, Zhang C H, et al. Asymptotic normality and optimalities in estimation of large Gaussian graphical models. The Annals of Statistics, 2015, 43 (3): 991–1026. doi: 10.1214/14-AOS1286
    [40]
    Bickel P J, Levina E. Regularized estimation of large covariance matrices. The Annals of Statistics, 2008, 36 (1): 199–227. doi: 10.1214/009053607000000758
    [41]
    Bickel P J, Levina E. Covariance regularization by thresholding. The Annals of Statistics, 2008, 36 (6): 2577–2604. doi: 10.1214/08-AOS600
  • 加载中

Catalog

    [1]
    Baselmans B M, Jansen R, Ip H F, et al. Multivariate genome-wide analyses of the well-being spectrum. Nature Genetics, 2019, 51 (3): 445–451. doi: 10.1038/s41588-018-0320-8
    [2]
    Yang K, Lee L F. Identification and QML estimation of multivariate and simultaneous equations spatial autoregressive models. Journal of Econometrics, 2017, 196 (1): 196–214. doi: 10.1016/j.jeconom.2016.04.019
    [3]
    Zhu X, Huang D, Pan R, et al. Multivariate spatial autoregressive model for large scale social networks. Journal of Econometrics, 2020, 215 (2): 591–606. doi: 10.1016/j.jeconom.2018.11.018
    [4]
    Han F, Liu H. Optimal rates of convergence for latent generalized correlation matrix estimation in transelliptical distribution. arXiv: 1305.6916, 2013.
    [5]
    Rubinstein M. Markowitz’s “portfolio selection”: A fifty-year retrospective. The Journal of Finance, 2002, 57 (3): 1041–1045. doi: 10.1111/1540-6261.00453
    [6]
    Wegkamp M, Zhao Y. Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas. Bernoulli, 2016, 22 (2): 1184–1226. doi: 10.3150/14-BEJ690
    [7]
    Fan J, Han F, Liu H. Challenges of big data analysis. National Science Review, 2014, 1 (2): 293–314. doi: 10.1093/nsr/nwt032
    [8]
    Cai T, Liu W, Luo X. A constrained 1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 2011, 106 (494): 594–607. doi: 10.1198/jasa.2011.tm10155
    [9]
    Tokuda T, Goodrich B, van Mechelen I, et al. Visualizing distributions of covariance matrices. New York: Columbia University, 2011.
    [10]
    Fan J, Peng H. Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 2004, 32 (3): 928–961. doi: 10.1214/009053604000000256
    [11]
    Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika, 2007, 94 (1): 19–35. doi: 10.1093/biomet/asm018
    [12]
    Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical Lasso. Biostatistics, 2008, 9 (3): 432–441. doi: 10.1093/biostatistics/kxm045
    [13]
    Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. The Journal of Machine Learning Research, 2008, 9: 485–516. doi: 10.5555/1390681.1390696
    [14]
    Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the Lasso. The Annals of Statistics, 2006, 34 (3): 1436–1462. doi: 10.1214/009053606000000281
    [15]
    Wille A, Zimmermann P, Vranova E, et al. Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology, 2004, 5 (11): R92. doi: 10.1186/gb-2004-5-11-r92
    [16]
    Rothman A J, Bickel P J, Levina E, et al. Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2008, 2: 494–515. doi: 10.1214/08-EJS176
    [17]
    Lam C, Fan J. Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics, 2009, 37 (6B): 4254–4278. doi: 10.1214/09-AOS720
    [18]
    Yuan M. High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 2010, 11: 2261–2286. doi: 10.5555/1756006.1859930
    [19]
    Liu W, Luo X. High-dimensional sparse precision matrix estimation via sparse column inverse operator. arXiv: 1203.3896, 2012.
    [20]
    Sun T, Zhang C H. Sparse matrix inversion with scaled Lasso. The Journal of Machine Learning Research, 2013, 14 (1): 3385–3418. doi: 10.5555/2567709.2567771
    [21]
    Fan Y, Lv J. Innovated scalable efficient estimation in ultra-large Gaussian graphical models. The Annals of Statistics, 2016, 44 (5): 2098–2126. doi: 10.1214/15-AOS1416
    [22]
    Bickel P, Ritov Y. Efficient estimation in the errors in variables model. The Annals of Statistics, 1987, 15 (2): 513–540. doi: 10.1214/aos/1176350358
    [23]
    Ma Y, Li R. Variable selection in measurement error models. Bernoulli, 2010, 16 (1): 274–300. doi: 10.3150/09-bej205
    [24]
    Liang H, Li R. Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association, 2009, 104 (485): 234–248. doi: 10.1198/jasa.2009.0127
    [25]
    Städler N, Bühlmann P. Missing values: Sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing, 2012, 22 (1): 219–235. doi: 10.1007/s11222-010-9219-7
    [26]
    Loh P L, Wainwright M J. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Advances in Neural Information Processing Systems, 2012, 40 (3): 1637–1664. doi: 10.1214/12-AOS1018
    [27]
    Belloni A, Rosenbaum M, Tsybakov A B. Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2017, 79 (3): 939–956. doi: 10.1111/rssb.12196
    [28]
    Datta A, Zou H. Cocolasso for high-dimensional error-in-variables regression. The Annals of Statistics, 2017, 45 (6): 2400–2426. doi: 10.1214/16-AOS1527
    [29]
    Tibshirani R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58 (1): 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
    [30]
    Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2001, 96 (456): 1348–1360. doi: 10.1198/016214501753382273
    [31]
    Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005, 67 (2): 301–320. doi: 10.1111/j.1467-9868.2005.00503.x
    [32]
    Zou H. The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 2006, 101 (476): 1418–1429. doi: 10.1198/016214506000000735
    [33]
    Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2007, 35 (6): 2313–2351. doi: 10.1214/009053606000001523
    [34]
    Bickel P J, Ritov Y, Tsybakov A B. Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 2009, 37 (4): 1705–1732. doi: 10.1214/08-AOS620
    [35]
    Zhao P, Yu B. On model selection consistency of Lasso. The Journal of Machine Learning Research, 2006, 7: 2541–2563. doi: 10.5555/1248547.1248637
    [36]
    Wainwright M J. Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso). IEEE Transactions on Information Theory, 2009, 55 (5): 2183–2202. doi: 10.1109/TIT.2009.2016018
    [37]
    Buldygin V V, Kozachenko Yu V. Metric Characterization of Random Variables and Random Processes. Providence, RI: American Mathematical Society, 2000.
    [38]
    Sun T, Zhang C H. Scaled sparse linear regression. Biometrika, 2012, 99 (4): 879–898. doi: 10.1093/biomet/ass043
    [39]
    Ren Z, Sun T, Zhang C H, et al. Asymptotic normality and optimalities in estimation of large Gaussian graphical models. The Annals of Statistics, 2015, 43 (3): 991–1026. doi: 10.1214/14-AOS1286
    [40]
    Bickel P J, Levina E. Regularized estimation of large covariance matrices. The Annals of Statistics, 2008, 36 (1): 199–227. doi: 10.1214/009053607000000758
    [41]
    Bickel P J, Levina E. Covariance regularization by thresholding. The Annals of Statistics, 2008, 36 (6): 2577–2604. doi: 10.1214/08-AOS600

    Article Metrics

    Article views (273) PDF downloads(960)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return