ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

The research of speaker diarization based on BIC and G_PLDA

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2015.04.005
  • Received Date: 04 November 2014
  • Accepted Date: 13 December 2014
  • Rev Recd Date: 13 December 2014
  • Publish Date: 30 April 2015
  • The traditional technology for speaker diarization(SD), which exploits the Bayesian information criterion(BIC) as the similarity metric, can obtain good results in the short dialogue task, but with the length of the dialogue increasing , single Gaussian model of BIC is insufficient to describe the information distribution of different speakers. Moveover, it is difficult to delineate the threshold between the same speakers and different speakers when using hierarchical clustering (HAC). To solve this problem, a fusion method between BIC and G_PLDA was proposed, so as to make full use of the reliability of BIC in short-term clustering and the excellent discriminating power of G_PLDA in long utterancs. A set of experiments based on NIST 08 Summed shows that this new fusion method reduces the diariazation error rate (DER) from 2.34% of BIC baseline system to 1.54%, improving performance of speaker diarization by 34.2%.
    The traditional technology for speaker diarization(SD), which exploits the Bayesian information criterion(BIC) as the similarity metric, can obtain good results in the short dialogue task, but with the length of the dialogue increasing , single Gaussian model of BIC is insufficient to describe the information distribution of different speakers. Moveover, it is difficult to delineate the threshold between the same speakers and different speakers when using hierarchical clustering (HAC). To solve this problem, a fusion method between BIC and G_PLDA was proposed, so as to make full use of the reliability of BIC in short-term clustering and the excellent discriminating power of G_PLDA in long utterancs. A set of experiments based on NIST 08 Summed shows that this new fusion method reduces the diariazation error rate (DER) from 2.34% of BIC baseline system to 1.54%, improving performance of speaker diarization by 34.2%.
  • loading
  • [1]
    Moattar M H, Homayounpour M M. A review on speaker diarization systems and approaches[J]. Speech Communication, 2012, 54(10): 1065-1103.
    [2]
    Tranter S E, Reynolds D A. An overview of automatic speaker diarization systems[J]. IEEE Transactions on Audio, Speech, and Language Processing, ,2006, 14(5): 1557-1565.
    [3]
    Makino S, Lee T W, Sawada H. Blind Speech Separation[M]. Berlin, Germany: Springer, 2007.
    [4]
    Wang D L, Brown G J. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications[M]. New Jersey, USA: Wiley, 2006.
    [5]
    Chen S S, Gopalakrishnan P S. Speaker, environment and channel change detection and clustering via the Bayesian information criterion[C]// Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Morgan Kaufman, 1998: 127-132.
    [6]
    Ben M, Betser M, Bimbot F, et al. Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs[C]// Proceedings of the International Conference on Spoken Language Processing. Jeju, Korea: IEEE Press, 2004: 2329-2332.
    [7]
    Dehak N, Kenny P, Dehak R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788-798.
    [8]
    Shum S, Dehak N, Chuangsuwanich E, et al. Exploiting Intra-Conversation Variability for Speaker Diarization[C]// Proceedings of the 11th Annual International Speech Communication Association. Florence, Italy: IEEE Press, 2011: 945-948.
    [9]
    Glembek O, Burget L, Matějka P, et al. Simplification and optimization of i-vector extraction[C]// International Conference on Acoustics, Speech and Signal Processing. Brno, Czech: IEEE Press, 2011: 4516-4519.
    [10]
    Prince S J D, Elder J E. Probabilistic linear discriminant analysis for inferences about identity[C]// 11th International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE Press, 2007: 1-8.
    [11]
    Kenny P. Bayesian speaker verification with heavy-tailed priors[C]// Proceedings of the Odyssey Speeker and Language Recognition Workshop. Brno, Czech Republic: IEEE Press, 2010: 14.
    [12]
    Kenny P, Stafylakis T, Ouellet P, et al. PLDA for speaker verification with utterances of arbitrary duration[C]// International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE Press, 2013: 7649-7653.
    [13]
    Garcia-Romero D, Espy-Wilson Y. Analysis of I-vector length normalization in speaker recognition systems[C]// Proceedings of the 11th Annual International Speech Communication Association. Florence, Italy: IEEE Press, 2011: 249-252.)
  • 加载中

Catalog

    [1]
    Moattar M H, Homayounpour M M. A review on speaker diarization systems and approaches[J]. Speech Communication, 2012, 54(10): 1065-1103.
    [2]
    Tranter S E, Reynolds D A. An overview of automatic speaker diarization systems[J]. IEEE Transactions on Audio, Speech, and Language Processing, ,2006, 14(5): 1557-1565.
    [3]
    Makino S, Lee T W, Sawada H. Blind Speech Separation[M]. Berlin, Germany: Springer, 2007.
    [4]
    Wang D L, Brown G J. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications[M]. New Jersey, USA: Wiley, 2006.
    [5]
    Chen S S, Gopalakrishnan P S. Speaker, environment and channel change detection and clustering via the Bayesian information criterion[C]// Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Morgan Kaufman, 1998: 127-132.
    [6]
    Ben M, Betser M, Bimbot F, et al. Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs[C]// Proceedings of the International Conference on Spoken Language Processing. Jeju, Korea: IEEE Press, 2004: 2329-2332.
    [7]
    Dehak N, Kenny P, Dehak R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788-798.
    [8]
    Shum S, Dehak N, Chuangsuwanich E, et al. Exploiting Intra-Conversation Variability for Speaker Diarization[C]// Proceedings of the 11th Annual International Speech Communication Association. Florence, Italy: IEEE Press, 2011: 945-948.
    [9]
    Glembek O, Burget L, Matějka P, et al. Simplification and optimization of i-vector extraction[C]// International Conference on Acoustics, Speech and Signal Processing. Brno, Czech: IEEE Press, 2011: 4516-4519.
    [10]
    Prince S J D, Elder J E. Probabilistic linear discriminant analysis for inferences about identity[C]// 11th International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE Press, 2007: 1-8.
    [11]
    Kenny P. Bayesian speaker verification with heavy-tailed priors[C]// Proceedings of the Odyssey Speeker and Language Recognition Workshop. Brno, Czech Republic: IEEE Press, 2010: 14.
    [12]
    Kenny P, Stafylakis T, Ouellet P, et al. PLDA for speaker verification with utterances of arbitrary duration[C]// International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE Press, 2013: 7649-7653.
    [13]
    Garcia-Romero D, Espy-Wilson Y. Analysis of I-vector length normalization in speaker recognition systems[C]// Proceedings of the 11th Annual International Speech Communication Association. Florence, Italy: IEEE Press, 2011: 249-252.)

    Article Metrics

    Article views (30) PDF downloads(82)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return