Multimedia and Multilingual Human-Centered Content Discovery

In TraceThem, we will carry out research in algorithmic techniques of multimedia and multilingual search, working in real environments where current techniques fail in performance, generalisation and scalability. Apart from the traditional audiovisual contents (TV shows, news, movies, series...) new scenarios and types of contents have emerged in recent years (MOOCs, video blogs, tutorials...) where the automation of the search process for accessing the contents is a key aspect; processing these multimedia documents involves the added difficulty that, often, contents appear in different languages, representing a higher technological challenge, as tools adapted to different languages are needed, which is not always possible due to the lack of resources or tools to enable completely language independent content indexing. The information that we intend to extract is always within a communicative context (“from” someone and “for” someone), so the characterisation of the people involved in this context will play a central role. We will focus on finding information about people and their way of interacting (“who they are”, “what they say”, “how they communicate”, “how they are doing”), with a special interest in discovering people and content. The extraction of information related to people will be performed through audio processing, video processing and combined audio and video processing. To do this, we will focus on searching for technologies and new solutions for: multimedia content analysis, voice and face biometrics, audio segmentation and speaker diarization, detection of the emotional state and detection of people interacting. Content extraction will be primarily performed by processing audio using both language-dependent and language-independent search on speech. The scientific-technical impact and dissemination of project results will be favored by participation in international competitive evaluations related to the described issue, as these are important in the development of this project activity because they allow to use data sets related with tasks that constitute the current technological challenges. Moreover, in these competitions, common experimental frameworks are set up to enhance collaboration with other research groups and to allow comparisons of different algorithms, helping to discover the strengths and weaknesses of algorithms and developed systems.


These are  two video demo of our research on  "Semantic Indexing and Searching in Multimedia Contents" that are related to the TraceThem activities:


Le N, Bredin H, Sargent G, India M, López Otero P, Barras C, et al.. Towards large scale multimedia indexing: A case study on person discovery in broadcast news. In International Worskhop on Content-Based Multimedia Indexing. 2017.
López Otero P, Docío Fernández L, García Mateo C. Compensating Gender Variability in Query-by-Example Search on Speech Using Voice Conversion. Interspeech. 2017. pp. 2909-2913.
Magariños C, López Otero P, Docío Fernández L, Rodríguez Banga E, Erro D, García Mateo C. Reversible speaker de-identification using pre-trained transformation functions. Computer Speech & Language. 2017;46:36-52.
López Otero P, Magariños C, Docío Fernández L, Rodríguez Banga E, Erro D, García Mateo C. Influence of speaker de-identification in depression detection. IET Signal Processing. 2017;.
López Otero P, Docío Fernández L, García Mateo C. Better Phoneme Recognisers Lead to Better Phoneme Posteriorgrams for Search on Speech? An Experimental Analysis. In Lecture Notes in Artificial Intelligence. Springer; 2016. pp. 128-137.
Magariños C, Erro D, Rodríguez Banga E. Language-independent acoustic cloning of HTS voices: a preliminary study [Internet]. ICASSP. Shanghai, China; 2016. pp. 5615-5619.
López Otero P, Docío Fernández L, García Mateo C. GTM-UVigo System for Albayzin 2016 Speaker Diarisation Evaluation. Iberspeech 2016. 2016. pp. 1-8.
López Otero P, Docío Fernández L, García Mateo C. GTM-UVigo System for Multimodal Person Discovery in Broadcast TV Task at MediaEval 2016. MediaEval 2016. 2016. PDF icon personDiscovery.pdf (151.42 KB)
Funded by: 
Ministerio de Economía y Competitividad
University of Vigo
Start date: 
End date: 
Number of investigators: