Baseline speech intelligibility model in round one

Some comments on signal alignment and level-insensitivity

Our baseline binaural speech intelligibility measure in round one is the Modified Binaural Short-Time Objective Intelligibility measure, or MBSTOI. This short post outlines the importance of correcting for delays that your hearing aid processing algorithm introduces into the audio signals to allow MBSTOI to estimate the speech intelligibility accurately. It also discusses the importance of considering the audibility of signals before evaluation with MBSTOI.


In stage one, entries will be ranked according to the average MBSTOI score across all samples in the evaluation test set. In the second stage, entries will be evaluated by the listening panel. There will be prizes for both stages. See this post for more information.

Signal alignment in time and frequency

If the signal processed by the hearing aid introduces a significant delay, you should correct for this delay before submitting your entry. This is necessary because MBSTOI requires alignment of the clean speech “reference” with the processed signal in time and frequency. This needs to be done for both ear signals.

MBSTOI downsamples signals to 10 kHz, uses a Discrete Fourier Transform to decompose the signal into one-third octave bands, and performs envelope extraction and short-time segmentation into 386 ms regions. Each region consists of 30 frames. These approaches are motivated by what is know about which frequencies and modulation frequencies are most important for intelligibility. For each frequency band and frame (over the region of which it is the last frame), an intermediate correlation coefficient is calculated between the clean reference and processed power envelopes for each ear. These are averaged to obtain the MBSTOI index. Thus is usually between 0 and 1, and rises monotonically with measured intelligibility scores, such that higher values indicate greater speech intelligibility. Alignment is therefore required at the level of the one-third octave bands and short-time regions.

Our baseline corrects for broadband delay per ear due to the hearing loss model. (The delay is measured by running a kronnecker delta function through the model for each ear.) However, the baseline software will not correct for delays created by your hearing aid processing.

Consequently, when submitting your hearing aid output signals, you are responsible for correcting for any delays introduced by your hearing aid. Note that this must be done blindly; the clean reference signals will not be supplied for the test/evaluation set.

Level insensitivity

MBSTOI is broadly insensitive to the level of the processed signal because it is calculated using a cross-correlation method. This could be a problem because sounds that are below the auditory thresholds of the hearing impaired listener may appear to MBSTOI to be highly intelligible.

To overcome this, the baseline experimental code mbstoi_beta, in conjunction with the baseline hearing loss model, can be used to approximate hearing-impaired auditory thresholds. Specifically, mbstoi_beta adds internal noise that can be used to approximate normal hearing auditory thresholds. This noise, in combination with the attenuation of signals by the hearing loss model to simulate raised auditory thresholds, makes MBSTOI level-sensitive.

The noise is created by filtering white noise using pure tone threshold filter coefficients with one-third octave weighting, approximating the shape of a typical auditory filter (from Moore 2012, based on Patterson’s method, 1976). This noise is added to the processed signal. Note, the standard MBSTOI in the equalisation-cancellation stage adds internal noise to parameters, but this is an independent process.


The method was developed by Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan and Jesper Jensen (Andersen et al., 2018). It builds on the Short-Time Objective Intelligibility (STOI) metric created by Cees H. Taal, Richard C. Hendriks, Richard Heusdens, and Jesper Jensen (Taal et al., 2011). MBSTOI includes a better ear stage and an equalisation-cancellation stage. For simplicity, the latter stage is not discussed here; see Andersen et al. (2018) for details.


Andersen, A. H., de Haan, J. M., Tan, Z. H., & Jensen, J. (2018). Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions. Speech Communication, 102, 1-13.

Moore, B. C. (2012). An introduction to the psychology of hearing. Brill.

Patterson, R. D. (1976). Auditory filter shapes derived with noise stimuli. The Journal of the Acoustical Society of America, 59(3), 640-654.

Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2125-2136.

Rhoddy Viveros Muñoz Q&A

‘This job is an opportunity to learn more about the needs of people with a hearing impairment, which I hope will be my area of research in the future.’

Rhoddy Viveros Muñoz

What do you do on the project?

My work has focused on room acoustic simulation, especially on the generation of different virtual room acoustic environments. Using the software RAVEN [1], thousands of different rooms with different characteristics could be simulated. This allowed us to generate binaural room impulse responses (BRIRs) for a target and interferers under different room acoustic conditions.

How did you end up working in this area?

My background is as an electrical engineer, but my search for more meaningful work to help people led me to pursue a PhD in Medical Acoustics. My doctoral thesis was on speech-in-noise perception in virtual acoustic environments with moving sound sources. So, this job is an opportunity to learn more about the needs of people with a hearing impairment, which I hope will be my area of research in the future.

What is your role on the clarity Project?

In my role, I provide support in creating the sound stimuli. The generation of BRIRs for the target and the interferers is fundamental to the creation of all of our speech and noise stimuli.

What is exciting about the clarity project?

I strongly believe that the final aim of the challenge is to encourage participants to improve the quality of life of hearing impaired people. What could be more exciting?

What would success look like for the project?

The success of the project would be to find novel processes and models that really improve the quality of life of people with hearing problems. Hearing impairment is known to lead people to isolate themselves from friends and family, and can even cause deep depression. Therefore, whatever improvement in their hearing the challenge may bring, in the end, it can only do good.

What hurdles are you going to face getting the project to be a success?

In addition to the technical challenges related to generating thousands of different room simulations, a prolonged lockdown due to the coronavirus has been a big hurdle for all of us.

If you could go back (or forward) in time and discover new science or invent anything, what would it be?

It would definitely be the cure for cancer. Because thousands of people die every year from cancer, I think that would be the greatest achievement of all.


[1] Schröder, D. and Vorländer, M., 2011, January. RAVEN: A real-time framework for the auralization of interactive virtual environments. In Proceedings of Forum Acusticum 2011 (pp. 1541-1546). Denmark: Aalborg.

Clarity Challenge Pre-announcement

Although age-related hearing loss affects 40% of 55 to 74 year-olds, the majority of adults who would benefit from hearing aids don’t use them. A key reason is simply that hearing aids don’t provide enough benefit.

Picking out speech from background noise is a critical problem even for the most sophisticated devices. The purpose of the Clarity Challenges is to catalyse new work to radically improve the speech intelligibility provided by hearing aids.

The series of challenges will consider increasingly complex listening scenarios. The first round, launching in January 2021, will focus on speech in indoor environments in the presence of a single interferer. It will begin with a challenge involving improving hearing aid processing. Future challenges on how to model speech-in-noise perception will be launched at a later date.

The Task

You will be provided with simulated scenes, each including a target speaker and interfering noise. For each scene, there will be signals that simulate those captured by a behind-the-ear hearing aid with three channels at each ear and those captured at the eardrum without a hearing aid present.  The target speech will be a short sentence and the interfering noise will be either speech or domestic appliance noise.

The task will be to deliver a hearing aid signal processing algorithm that can improve the intelligibility of the target speaker for a specified hearing-impaired listener. Initially, entries will be evaluated using an objective speech intelligibility measure we will provide. Subsequently, up to twenty of the most promising systems will be evaluated by a panel of listeners.

We will provide a baseline system so that teams can choose to focus on individual components or to develop their own complete pipelines.

What will be provided

  • Evaluation of the best entries by a panel of hearing-impaired listeners.
  • Speech + interferer scenes for training and evaluation.
  • An entirely new database of 10,000 spoken sentences
  • Listener characterisations including audiograms and speech-in-noise testing.
  • Software including tools for generating training data, a baseline hearing aid algorithm, a baseline model of hearing impairment, and a binaural objective intelligibility measure.

Important Dates

  • January 2021 – Challenge launch and release of software and data
  • April 2021 –  Evaluation data released
  • May 2021 – Submission deadline
  • June-August 2021  – Listening test evaluation period
  • September 2021 – Results announced at a Clarity Challenge Workshop in conjunction with Interspeech 2021

Challenge and workshop participants will be invited to contribute to a journal Special Issue on the topic of Machine Learning for Hearing Aid Processing that will be announced next year.

Further information

If you are interested in participating and wish to receive further information, please sign up.

If you have questions, contact us directly at


Prof. Jon P. Barker, Department of Computer Science, University of Sheffield
Prof. Michael A. Akeroyd, Hearing Sciences, School of Medicine, University of Nottingham
Prof. Trevor J. Cox, Acoustics Research Centre, University of Salford
Prof. John F. Culling, School of Psychology, Cardiff University
Prof. Graham Naylor, Hearing Sciences, School of Medicine, University of Nottingham
Dr Simone Graetzer, Acoustics Research Centre, University of Salford
Dr Rhoddy Viveros Muñoz, School of Psychology, Cardiff University
Eszter Porter, Hearing Sciences, School of Medicine, University of Nottingham

Funded by the Engineering and Physical Sciences Research Council (EPSRC), UK.

Supported by RNID (formerly Action on Hearing Loss), Hearing Industry Research Consortium, Amazon TTS Research, Honda Research Institute Europe.


The image copyright is owned by the University of Nottingham.