The research of speaker diarization based on BIC and G_PLDA

LI Rui; ZHUO Zhu; LI Hui

doi:10.3969/j.issn.0253-2778.2015.04.005

PDF( 1648 KB)

Open Access JUSTC Original Paper

The research of speaker diarization based on BIC and G_PLDA

Department of Electronic Science and Technology, University of Science and Technology of China, Hefei 230027, China

Cite this:

https://doi.org/10.3969/j.issn.0253-2778.2015.04.005

Received Date: 04 November 2014
Accepted Date: 13 December 2014
Rev Recd Date: 13 December 2014
Publish Date: 30 April 2015

Abstract Full text PDF

Abstract

Abstract

The traditional technology for speaker diarization(SD), which exploits the Bayesian information criterion(BIC) as the similarity metric, can obtain good results in the short dialogue task, but with the length of the dialogue increasing , single Gaussian model of BIC is insufficient to describe the information distribution of different speakers. Moveover, it is difficult to delineate the threshold between the same speakers and different speakers when using hierarchical clustering (HAC). To solve this problem, a fusion method between BIC and G_PLDA was proposed, so as to make full use of the reliability of BIC in short-term clustering and the excellent discriminating power of G_PLDA in long utterancs. A set of experiments based on NIST 08 Summed shows that this new fusion method reduces the diariazation error rate (DER) from 2.34% of BIC baseline system to 1.54%, improving performance of speaker diarization by 34.2%.

Abstract

The traditional technology for speaker diarization(SD), which exploits the Bayesian information criterion(BIC) as the similarity metric, can obtain good results in the short dialogue task, but with the length of the dialogue increasing , single Gaussian model of BIC is insufficient to describe the information distribution of different speakers. Moveover, it is difficult to delineate the threshold between the same speakers and different speakers when using hierarchical clustering (HAC). To solve this problem, a fusion method between BIC and G_PLDA was proposed, so as to make full use of the reliability of BIC in short-term clustering and the excellent discriminating power of G_PLDA in long utterancs. A set of experiments based on NIST 08 Summed shows that this new fusion method reduces the diariazation error rate (DER) from 2.34% of BIC baseline system to 1.54%, improving performance of speaker diarization by 34.2%.

FullText(HTML)

References(13)

References

[1]	Moattar M H, Homayounpour M M. A review on speaker diarization systems and approaches[J]. Speech Communication, 2012, 54(10): 1065-1103.
[2]	Tranter S E, Reynolds D A. An overview of automatic speaker diarization systems[J]. IEEE Transactions on Audio, Speech, and Language Processing, ,2006, 14(5): 1557-1565.
[3]	Makino S, Lee T W, Sawada H. Blind Speech Separation[M]. Berlin, Germany: Springer, 2007.
[4]	Wang D L, Brown G J. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications[M]. New Jersey, USA: Wiley, 2006.
[5]	Chen S S, Gopalakrishnan P S. Speaker, environment and channel change detection and clustering via the Bayesian information criterion[C]// Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Morgan Kaufman, 1998: 127-132.
[6]	Ben M, Betser M, Bimbot F, et al. Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs[C]// Proceedings of the International Conference on Spoken Language Processing. Jeju, Korea: IEEE Press, 2004: 2329-2332.
[7]	Dehak N, Kenny P, Dehak R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788-798.
[8]	Shum S, Dehak N, Chuangsuwanich E, et al. Exploiting Intra-Conversation Variability for Speaker Diarization[C]// Proceedings of the 11th Annual International Speech Communication Association. Florence, Italy: IEEE Press, 2011: 945-948.
[9]	Glembek O, Burget L, Matějka P, et al. Simplification and optimization of i-vector extraction[C]// International Conference on Acoustics, Speech and Signal Processing. Brno, Czech: IEEE Press, 2011: 4516-4519.
[10]	Prince S J D, Elder J E. Probabilistic linear discriminant analysis for inferences about identity[C]// 11th International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE Press, 2007: 1-8.
[11]	Kenny P. Bayesian speaker verification with heavy-tailed priors[C]// Proceedings of the Odyssey Speeker and Language Recognition Workshop. Brno, Czech Republic: IEEE Press, 2010: 14.
[12]	Kenny P, Stafylakis T, Ouellet P, et al. PLDA for speaker verification with utterances of arbitrary duration[C]// International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE Press, 2013: 7649-7653.
[13]	Garcia-Romero D, Espy-Wilson Y. Analysis of I-vector length normalization in speaker recognition systems[C]// Proceedings of the 11th Annual International Speech Communication Association. Florence, Italy: IEEE Press, 2011: 249-252.)

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	Moattar M H, Homayounpour M M. A review on speaker diarization systems and approaches[J]. Speech Communication, 2012, 54(10): 1065-1103.
[2]	Tranter S E, Reynolds D A. An overview of automatic speaker diarization systems[J]. IEEE Transactions on Audio, Speech, and Language Processing, ,2006, 14(5): 1557-1565.
[3]	Makino S, Lee T W, Sawada H. Blind Speech Separation[M]. Berlin, Germany: Springer, 2007.
[4]	Wang D L, Brown G J. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications[M]. New Jersey, USA: Wiley, 2006.
[5]	Chen S S, Gopalakrishnan P S. Speaker, environment and channel change detection and clustering via the Bayesian information criterion[C]// Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Morgan Kaufman, 1998: 127-132.
[6]	Ben M, Betser M, Bimbot F, et al. Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs[C]// Proceedings of the International Conference on Spoken Language Processing. Jeju, Korea: IEEE Press, 2004: 2329-2332.
[7]	Dehak N, Kenny P, Dehak R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788-798.
[8]	Shum S, Dehak N, Chuangsuwanich E, et al. Exploiting Intra-Conversation Variability for Speaker Diarization[C]// Proceedings of the 11th Annual International Speech Communication Association. Florence, Italy: IEEE Press, 2011: 945-948.
[9]	Glembek O, Burget L, Matějka P, et al. Simplification and optimization of i-vector extraction[C]// International Conference on Acoustics, Speech and Signal Processing. Brno, Czech: IEEE Press, 2011: 4516-4519.
[10]	Prince S J D, Elder J E. Probabilistic linear discriminant analysis for inferences about identity[C]// 11th International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE Press, 2007: 1-8.
[11]	Kenny P. Bayesian speaker verification with heavy-tailed priors[C]// Proceedings of the Odyssey Speeker and Language Recognition Workshop. Brno, Czech Republic: IEEE Press, 2010: 14.
[12]	Kenny P, Stafylakis T, Ouellet P, et al. PLDA for speaker verification with utterances of arbitrary duration[C]// International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE Press, 2013: 7649-7653.
[13]	Garcia-Romero D, Espy-Wilson Y. Analysis of I-vector length normalization in speaker recognition systems[C]// Proceedings of the 11th Annual International Speech Communication Association. Florence, Italy: IEEE Press, 2011: 249-252.)

TrendMD

Volume 45 Issue 4 page: 286-293

Cover

Keywords

Article Metrics

Article views (30) PDF downloads(82)

The research of speaker diarization based on BIC and G_PLDA

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

The research of speaker diarization based on BIC and G_PLDA

Share

Tools

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content