Programme
The workshop will be a one day, online event held on 29th June. Timings and session details are provided below. All times are UK local time (i.e., UTC+1).
9:00 | Welcome [slides] | |
9:10 | The Clarity Prediction Overview / Results [slides] | |
9:40 | Challenge Papers: Session I | |
10:40 | Break | |
10:50 | Challenge Papers: Session II | |
11:50 | Break | |
12:00 | Challenge Papers: Session III | |
13:00 | Break | |
13:10 | Invited Talk - Theo Goverts, Vrije Universiteit Amsterdam | |
14:00 | Break | |
14:10 | Prizes and conclusions [slides] | |
14:20 | CPC discussion + Future Directions | |
15:00 | Close |
Invited Talk
Theo Goverts audiologist (MPE), Amsterdam UMC
Speech recognition in realistic scenarios: insights from binaural recordings in natural acoustic environments.
Speech recognition in realistic scenarios: insights from binaural recordings in natural acoustic environments.
Synopsis
In a study together with Steve Colburn (Boston University), we were interested in the acoustic characterization of realistic scenario’s for speech recognition (Goverts & Colburn, 2020) . The essential acoustic information in such scenario’s is a bilateral vibration pattern stimulating eardrums or hearing devices microphone membranes at both sides. This bilateral vibration pattern is the input for an interplay of bottom-up and top-down processing, leading to the actual perception of speech. Therefore we made bilateral recordings in a variety of environments that were considered relevant by experts and listeners with impaired hearing, e.g. at home, city walk, and public transport. Recordings were made using simple in-the-concha microphones and a data-recorder. We first looked at speech-likeness in the recordings using a non-intrusive modulation-spectrum based measure. We analysed absolute values, interaural differences and temporal dynamics in eight environments. Furthermore we looked at binaural parameters: Interaural Level Differences, Interaural Time differences and Interaural Coherence. We analysed absolute values and temporal dynamics in the same eight environments. Results show large variance in speech-likeness both within and between environments. Furthermore, some environments show large interaural differences in speech-likeness. It also shows that useful acoustic information is relatively sparse in realistic environments, putting more strain on the processing effort, especially for listeners with impaired hearing. Recently we replicated some of the analyses using the ARTE recordings (Weisser & Buchholz, 2019) , yielding comp[arable results. The implications of these studies for clinical audiology will be discussed with a focus on insights relevant for the aim of improving hearing-aid processing to optimise intelligibility of speech in noise for HI listeners as in the Clarity project.- Goverts, S. T., & Colburn, H. S. (2020). Binaural Recordings in Natural Acoustic Environments: Estimates of Speech-Likeness and Interaural Parameters. Trends in Hearing, 24. doi.org/10.1177/2331216520972858
- Weisser, A., & Buchholz, J. M. (2019). Conversational speech levels and signal-to-noise ratios in realistic acoustic conditions. Journal of the Acoustical Society of America, 145(1), 349-360. doi.org/10.1121/1.5087567
Bio
Theo Goverts is an audiologist (medical physics expert), researcher and residency director at Amsterdam University Medical Center. His research focuses on speech recognition in realistic scenarios and child, language and hearing.
Challenge Papers: Session I
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids [Report] [slides] (1National Taiwan University; 2Academia Sinica;3Southern University of Science and Technology of China |
|
Conformer-based Fusion of Text, Audio, and Listener Characteristics for Predicting Speech Intelligibility of Hearing Aid Users [Report] (1NTT Corporation, Japan; 2Wakayama University Japan) |
|
OBISHI: Objective Binaural Intelligibility Score for the Hearing Impaired [Report] [slides] (Japan Advanced Institute of Science and Technology) |
Challenge Papers: Session II
Speech Intelligibility Prediction for Hearing-Impaired Listeners with the bBSIM-STI Model [Report] (1Carl von Ossietzky University, Oldenburg, Germany; 2Fraunhofer IDMT, Oldenburg, Germany) |
|
Non-intrusive Speech Intelligibility Prediction from Binaural Signals Processed for Hearing Aid Users [Report] (1Durham University, UK; 2OFFIS, Oldenburg, Germany) |
|
Exploiting Hidden Representations from a DNN-based Speech Recogniser for
Speech Intelligibility Prediction in Hearing-impaired Listeners [Report] [slides] (University of Sheffield) |
|
Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction [Report] [slides] (University of Sheffield) |
Challenge Papers: Session III
Speech Intelligibility Prediction for Hearing-Impaired Listeners with Phoneme Classifiers based on Deep Learning [Report] [slides] (1Carl von Ossietzky University, Oldenburg, Germany; 2Fraunhofer IDMT, Oldenburg, Germany) |
|
Predicting Speech Intelligibility using SAMII: Spike Activity Mutual Information Index [Report] [slides] (Medizinische Hochschule Hannover, Germany) |
|
ELO-SPHERES Intelligibility Prediction Model for the Clarity Prediction Challenge 2022 [Report] (1University College London, UK; 2Imperial College London, UK) |