## CPC1 results and prizes

The 1st Clarity Prediction Challenge is now complete. Thank you to all who took part!

The full results can be found on the Clarity-2022 workshop website where you will also find links to system papers and the overview presentation.

Many of the systems have led to successful Interspeech 2022 papers and will be contributing to the Interspeech 2022 special session on Speech Intelligibility Prediction for Hearing-Impaired Listeners. We hope to see many of you in Korea!

In the meantime, please be sure to check out the onging 2nd Clarity Enhancement Challenge. The deadline for submitting enhanced signals is 1st September 2022, so there is still time to participate. To register a team please use the form here.

## CEC2 registration open

We are pleased to announce that registration for the 2nd Clarity Enhancement Challenge (CEC2) is now open.

To register please complete the simple Google form found on the registration page.

The remaining important dates for the challenge are as follows:

• 25th July 2022: Evaluation data released
• 1st Sept 2022: 1st round submission deadline for evaluation by objective measure
• 15th Sept 2022: 2nd round submission deadline for listening tests
• Sept-Nov 2022: Listening test evaluation period.
• 2nd Dec 2022: Results announced at a Clarity Challenge Workshop; prizes awarded.

The challenge training, dev data and initial tools are now fully from the Github repository.

## Release of CEC2 baseline

We are pleased to announce the release of the 2nd Clarity Enhancement Challenge (CEC2) baseline system code.

The baseline code has been released in the latest commit to the Clarity GitHub repository.

The baseline system perform NAL-R amplification according to the audiogram of the target listener, followed by a simple gain control and output of the signals to 16-bit stereo wav format. The system has been kept deliberately simple with no microphone array processing or attempt at noise cancellation.

HASPI scores for the dev set have been measured. The scores are as follows.

SystemHASPI
Unprocessed0.1615
NAL-R baseline0.2493

See here for further details.

If you have any problems using the baseline code please do not hesitate to contact us at claritychallengecontact@gmail.com, or post questions on the Google group.

## Launch of CEC2

We are pleased to announce the launch of the 2nd Clarity Enhancement Challenge (CEC2).

The website has been fully updated to provide you with all the information you will need to participate in the challenge.

The schedule for the challenge is as follows:

• 13th April 2022: Release of training and development data; initial tools.
• 30th April 2022: Release of full toolset and baseline system.
• 1st May 2022: Registration for challenge entrants opens.
• 25th July 2022: Evaluation data released
• 1st Sept 2022: 1st round submission deadline for evaluation by objective measure
• 15th Sept 2022: 2nd round submission deadline for listening tests
• Sept-Nov 2022: Listening test evaluation period.
• 2nd Dec 2022: Results announced at a Clarity Challenge Workshop; prizes awarded.

The challenge training, dev data and initial tools will be available from 13th April. In the meantime, please visit the CEC2 Intro page to learn more about the task.

Tags:

## Live events in January

The Clarity team are hosting two live sessions this month related to the Prediction Challenge. Everyone is welcome to attend, whether or not you have registered to participate in the challenge or are still considering signing up.

The presentations will be very similar to the webinar in November. These events are intended as a chance for people in different time zones to attend live and ask the team questions.

Hosting is via Microsoft Teams. You can join from your browser without needing to install Teams, but if you join from a mobile device you may need to install the Teams app.

## Webinar - Challenge Overview​

### Friday 14th January​

9:00 GMT | 17:00 CST (GMT+8)

An introduction to the aims of the challenge and some background to the problem of speech intelligibility prediction for hearing aids:

• Welcome, introduction to Clarity.
• Speech intelligibility models: Overview and why are they needed.
• Hearing impairment speech intelligibility prediction.
• The prediction challenge - details and how you can sign up to participate.
• Audience questions / discussion.

The presentations will be recorded and made available online shortly after the event. The Q&A discussion will not be recorded.

You are welcome to join slightly later if you are only interested in joining for the Q&A section (presentations should finish around 9:40 GMT).

## Live Q&A session​

### Monday 17th January​

17:00 GMT | 12:00 EST (GMT-5) | 9:00 PST (GMT-8)

A chance to ask the team questions about the Clarity Prediction Challenge - for anyone that could not attend the webinar on Friday 14th due to time zone differences.

Please note there will be no presentations in this session. The talks from Friday’s webinar will be uploaded to the Clarity project YouTube channel later in the day so you are invited to watch those before joining this live Q&A.

Tags:

## Introduction Webinar - Recording Available

The Clarity team recently hosted a webinar to introduce the Prediction Challenge. The recording is now available to view online:

# Slides

1 Welcome and Overview

2 Speech Intelligibility Models

3 Hearing Impariment and SI Prediction

4 Clarity Prediction Challenge Details

Note that we did not record the Q&A session at the end, but if you have questions about taking part in the challenge you can contact us at claritychallengecontact@gmail.com

Tags:

## Welcome to CPC1

Welcome to the new Clarity CPC1 site for the first prediction challenge launching in autumn 2021. Feel free to look around. At the moment we're still doing listening tests and preparing the data, so the download links don't work. If anything is unclear or you've got questions, please contact us through the Google group.

Tags:

The CEC1 submission deadline has now passed. Thank you to all the teams who sent us signals.

Please remember to submit your finalised system descriptions by June 22nd to the Clarity workshop following the instructions provided on the workshop website.

We are currently busy evaluating the submissions using the MBSTOI metric. We will be contacting teams on the 22nd with details of how to prepare signals for the listening panel evaluation.

If you have been working on the challenge but missed the submission deadline then please do get in contact. We will still be happy to receive your signals and system descriptions. Although late entries will not be eligible for the official challenge ranking, we will be happy to compute the eval set MBSTOI score for you and may even be able to arrange listening test evaluation through our panel.

## CEC1 eval data released

The evaluation dataset is now available to download from the myairbridge download site. The evaluation data filename is clarity_CEC1_data.scenes_eval.v1_1.tgz.

Full details of how to prepare your submission are now available on this site. Please read them carefully.

Registration: Teams must register via the Google form on the How To Submit page of this site. (Please complete this even if you have already completed a pre-registration form). Only one person from each team should register. Only those who have registered will be eligible to proceed to the evaluation. Once you have registered you will receive a confirmation email, a team ID and a link to a Google Drive to which you can upload your signals.

The submission consists of two components:

i) a technical document of up to 2 pages describing the system/model and any external data and pre-existing tools, software and models used. This should be prepared as a Clarity-2021 workshop abstract and submitted to the workshop.

ii) the set of processed signals that we will evaluate using the MBSTOI metric. Details of how to name and package your signals for upload can be found on the How To Submit page.

Listening Tests: Teams that do well in the MBSTOI evaluation will be notified on 22nd June and invited to submit further signals for the second stage Listening Test evaluation.

## Baseline speech intelligibility model in round one

### Some comments on signal alignment and level-insensitivity​

Our baseline binaural speech intelligibility measure in round one is the Modified Binaural Short-Time Objective Intelligibility measure, or MBSTOI. This short post outlines the importance of correcting for delays that your hearing aid processing algorithm introduces into the audio signals to allow MBSTOI to estimate the speech intelligibility accurately. It also discusses the importance of considering the audibility of signals before evaluation with MBSTOI.

## Evaluation​

In stage one, entries will be ranked according to the average MBSTOI score across all samples in the evaluation test set. In the second stage, entries will be evaluated by the listening panel. There will be prizes for both stages. See this page for more information.

## Latency, computation time and real-time operation

An explanation of the time and computational limits for the first round of the enhancement challenge.

## The 1st Clarity Enhancement Challenge​

For a hearing aid to work well for users, the processing needs to be quick. The output of the hearing aid should be produced with a delay of less than about 10 ms. Many audio processing techniques are non-causal, i.e., the output of the system depends on samples from the future. Such processing is useless for hearing aids and therefore our rules include a restriction on the use of future samples.

The rules state the following:

• Systems must be causal; the output at time t must not use any information from input samples more than 5 ms into the future (i.e., no information from input samples >t+5ms).
• There is no limit on computational cost.

## Clarity Challenge pre-announcement

Although age-related hearing loss affects 40% of 55 to 74 year-olds, the majority of adults who would benefit from hearing aids don’t use them. A key reason is simply that hearing aids don’t provide enough benefit.

Picking out speech from background noise is a critical problem even for the most sophisticated devices. The purpose of the Clarity Challenges is to catalyse new work to radically improve the speech intelligibility provided by hearing aids.

The series of challenges will consider increasingly complex listening scenarios. The first round, launching in January 2021, will focus on speech in indoor environments in the presence of a single interferer. It will begin with a challenge involving improving hearing aid processing. Future challenges on how to model speech-in-noise perception will be launched at a later date.

## One approach to our enhancement challenge

Improving hearing aid processing using DNNs blog. A suggested approach to overcome the non-differentiable loss function.

The aim of our Enhancement Challenge is to get people producing new algorithms for processing speech signals through hearing aids. We expect most entries to replace the classic hearing aid processing of Dynamic Range Compressors (DRCs) with deep neural networks (DNN) (although all approaches are welcome!). The first round of the challenge is going to be all about improving speech intelligibility.

Setting up a DNN structure and training regime for the task is not as straightforward as it might first appear. Figure 1 shows an example of a naive training regime. An audio example of Speech in Noise (SPIN) is randomly created (audio sample generation, bottom left), and a listener is randomly selected with particular hearing loss characteristics (random artificial listener generation, top left). The DNN Enhancement model (represented by the bright yellow box) then produces improved speech in noise. (Audio signals in pink are two-channel, left and right because this is for binaural hearing aids.)

Figure 1

Next the improved speech in noise is passed to the Prediction Model in the lime green box, and this gives an estimation of the Speech Intelligibility (SI). Our baseline system will include algorithms for this. We’ve already blogged about the Hearing Loss Simulation. Our current thinking is that the intelligibility model will be using a binaural form of the Short-Time Objective Intelligibility Index (STOI) [1]. The dashed line going back to the enhancement model shows that the DNN will be updated based on the reciprocal of the Speech Intelligibility (SI) score. By minimising (1/SI), the enhancement model will be maximising intelligibility.

## The speech-in-noise problem part two

How hearing aids address the problem of speech-in-noise in noisy and quieter places. We’ll also discuss what machine learning techniques are often used for noise reduction, and some promising strategies for hearing aids.

In a previous blog, we set out the problem of using hearing aids to pick out speech in noisy places. When the signal-to-noise ratio (SNR) is low, hearing aids can only do so much to improve the intelligibility of the speech.

A solitary hearing aid has various ways of addressing everyday constant noises such as cars, vacuum cleaners and fans. The aids work best when the noise is not too intrusive and SNR is relatively high. Problems arise when the noise is high (low SNRs), because then the hearing aid processing can distort the sound too much. While the hearing aid might have limited success in improving intelligibility in certain cases, they can still make the noise less annoying (e.g., Brons et al., 2014).

Using multiple microphones on each hearing aid can help in noisy conditions. The sound from the microphones is combined in a way that boosts the speech relative to the noise. This technology can be put into larger hearing aids, when there is enough spacing between the front and rear microphones.

One of the reasons why our brains are really good at picking out speech from the hubbub of a restaurant, is that it compares and contrasts the sounds from both ears. Our hearing is binaural. Similarly, if you have a hearing aids in both ears, they work better if they collaborate on reducing the noise.

Crucial to how our brains locate sound and pick out speech in noise are timing and level cues that come from comparing the sound at both ears. When sound comes from the side:

• interaural time differences occur because the sound arrives at one ear earlier than the other.
• interaural level differences occur because the sound has to bend around the head to reach the furthest ear.

Binaural hearing aids communicate wirelessly and use noise reduction strategies that preserve these interaural time and level difference cues (e.g., Van den Bogaert et al., 2009). This allows the listener’s brain to better locate the speech and boost this compared to the noise.

## Hearing loss simulation

What our hearing loss algorithms simulate, with audio examples to illustrate hearing loss.

Our challenge entrants are going to use machine learning to develop better processing of speech in noise (SPIN) for hearing aids. For a machine learning algorithm to learn new ways of processing audio for the hearing impaired, it needs to estimate how the sound will be degraded by any hearing loss. Hence, we need an algorithm to simulate hearing loss for each of our listeners. The diagram belows shows our draft baseline system that was detailed in a previous blog. The hearing loss simulation is part of the prediction model. The Enhancement Model to the left is effectively the hearing aid and the Prediction Model to the right is estimating how someone will perceive the intelligibility of the speech in noise.

The draft baseline system (where SPIN is speech in noise, DRC is Dynamic Range Compression, HL is Hearing Loss, SI is Speech Intelligibility and L & R are Left and Right).

There are different causes of hearing loss, but we’re concentrating on the most common type that happens when you age (presbycusis). RNID (formerly Action on Hearing Loss) estimate that more than 40% of people over the age of 50 have a hearing loss, and this rises to 70% of people who are older than 70.

The aspects of hearing loss we’ve decided to simulate are

1. The loss of ability to sense the quietest sounds (increase in absolute threshold).
2. How as an audible sound increases in level, the perceived increase in loudness is greater than normal (loudness recruitment) (Moore et al. 1996).
3. How the ear has a poorer ability to discriminate the frequency of sounds (impaired frequency selectivity).

## Sounds for round one

We’ll be challenging our contestants to find innovative ways of making speech more audible for hearing impaired listeners when there is noise getting in the way. But what noises should we consider? To aid us in choosing sounds and situations that are relevant to people with hearing aids, we held a focus group.

• Everyday background noises that make having a conversation difficult.
• The characteristics of speech after it has been processed by a hearing-aid that hearing aid listeners would value.

A total of eight patients (four males, four females) attended the meeting, six of whom were recruited from the Nottingham Biomedical Research Centre’s patient and public involvement contact list. Two attendees were recruited from a local lip reading class organised by the Nottinghamshire Deaf Society. The range of hearing loss within the group is from mild to severe. They all regularly use bilateral hearing aids.

Our focus was on the living room because that is the scenario for round one of the challenges.

Photo by Gustavo Fring from Pexels

## The speech-in-noise problem

People often have problems understanding speech in noise, and this is one of the main deficits of hearing aids that our machine learning challenges will address.

It’s common for us to hear sounds coming simultaneously from different sources. Our brains then need to separate out what we want to hear (the target speaker) from the other sounds. This is especially difficult when the competing sounds are speech. This has the quaint name, The Cocktail Party Problem (Cherry, 1953). We don’t go to many cocktail parties, but we encounter lots of times where the The Cocktail Party Problem is important. Hearing a conversation in a busy restaurant, trying to understand a loved one while the television is on or hearing the radio in the kitchen when the kettle is boiling, are just a few examples.

Difficulty in picking out speech in noise is really common if you have a hearing loss. Indeed, it’s often when people have problems doing this that they realise they have a hearing loss.

“Hearing aids don’t work when there is a lot of background noise. This is when you need them to work.”-- Statement from a hearing aid wearer (Kochkin, 2000)

Hearing aids are the the most common form of treatment for hearing loss. However, surveys indicate that at least 40% of hearing aids are never or rarely used (Knudsen et al., 2010). A major reason for this is dissatisfaction with performance. Even the best hearing aids perform poorly for speech in noise. This is particularly the case when there are many people talking at the same time, and when the amount of noise is relatively high (i.e., the signal-to-noise ratio (SNR) is low). As hearing ability worsen with age, the ability to understand speech in background noise also reduces (e.g., Akeroyd, 2008).

## Why use machine learning challenges for hearing aids?

The Clarity Project is based around the idea that machine learning challenges could improve hearing aid signal processing. After all this has happened in other areas, such as automatic speech recognition (ASR) in the presence of noise. The improvements in ASR have happened because of:

• Machine learning (ML) at scale – big data and raw GPU power.
• Benchmarking – research has developed around community-organised evaluations or challenges.
• Collaboration has been enabled by these challenges, allowing working across communities such as signal processing, acoustic modelling, language modelling and machine learning

We’re hoping that these three mechanisms can drive improvements in hearing aids.

## Components of a challenge​

There needs to be a common task based on a target application scenario to allow communities to gain from benchmarking and collaboration. Clarity project’s first enhancement challenge will be about hearing speech from a single talker in a typical living room, where there is one source of noise and a little reverberation.

We’re currently working on developing simulation tools to allow us to generate our living room data. The room acoustic will be simulated using RAVEN and the Hearing Device Head-related Transfer Functions will come from Denk’s work. We’re working on getting better, more ecologically valid speech than is often used in speech intelligibility work.

Entrants are then given training data and development (dev) test data along with a baseline system that represents the current state-of-the-art. You can find a post and video on the current thinking on the baseline here. We’re still working on the rules stipulating what is and what is not allowed (for example, will entrants be allowed to use data from outside the challenge).

Clarity’s first enhancement challenge is focussed on maximising the speech intelligibility (SI) score. We will evaluate this first through a prediciton model that is based on a hearing loss simulation and an objective metric for speech intellibility. Simulation has been hugely important for generating training data in the CHIME challenges and so we intend to use that approach in Clarity. But results from simulated test sets cannot be trusted and hence a second evaluation will come through perceptual tests on hearing impaired subjects. However, one of our current problems is that we can’t bring listeners into our labs because of COVID-19.

We’ll actually be running two challenges in roughly parallel, because we’re also going to task the community to improve our prediction model for speech intelligibility.

We’re running a series of challenges over five years. What other scenarios should we consider? What speech? What noise? What environment? Please comment below.

## Acknowledgements​

Much of this text is based on Jon Barker’s 2020 SPIN keynote

## The baseline

An overview of the current state of the baseline we’re developing for the machine learning challenges

We’re currently developing the baseline processing that challenge entrants will need. This takes a random listener and a random audio sample of speech in noise (SPIN) and passes that through a simulated hearing aid (the Enhancement Model). This improves the speech in noise. We then have an algorithm (the Prediction Model) to estimate the Speech Intelligibility that the listener would perceive (SI score). This score can then be used to drive machine learning to improve the hearing aid.

A talk through the baseline model we’re developing.

The first machine learning challenge is to improve the enhancement model, in other words, to produce a better processing algorithm for the hearing aid. The second challenge is to improve the prediction model using perceptual data we’ll provide.