GTM-UVigo System for Albayzin 2016 Speaker Diarisation Evaluation

TitleGTM-UVigo System for Albayzin 2016 Speaker Diarisation Evaluation
Publication TypeConference Proceedings
Year of Publication2016
AuthorsLópez Otero, P, Docío Fernández, L, García-Mateo, C
Conference NameIberspeech 2016
AbstractThis paper describes the system developed by the GTM-UVigo team for the closed-set condition of the Albayzin 2016 Speaker Diarisation evaluation. First, voice activity detection is performed using log-mel-filterbank features for audio representation and a deep neural network based classifier. The speech segments are subsequently segmented using an approach based on the Bayesian information criterion strategy for speaker segmentation. Since the voice activity detection stage occasionally labels music as speech, music segments are discarded using a logistic regression based classifier that relies on the i-vector paradigm for audio representation. The speaker clustering stage follows an online strategy where speech segments are also represented using i-vectors but, in this case, probabilistic linear discriminant analysis is applied, since a dramatic improvement of the clustering results is achieved using this technique.
ProjectMultimedia and Multilingual Human-Centered Content Discovery
Citation Key605