The speech-in-noise problem part two

How hearing aids address the problem of speech-in-noise in noisy and quieter places. We’ll also discuss what machine learning techniques are often used for noise reduction, and some promising strategies for hearing aids.

In a previous blog, we set out the problem of using hearing aids to pick out speech in noisy places. When the signal-to-noise ratio (SNR) is low, hearing aids can only do so much to improve the intelligibility of the speech.

A solitary hearing aid has various ways of addressing everyday constant noises such as cars, vacuum cleaners and fans. The aids work best when the noise is not too intrusive and SNR is relatively high. Problems arise when the noise is high (low SNRs), because then the hearing aid processing can distort the sound too much. While the hearing aid might have limited success in improving intelligibility in certain cases, they can still make the noise less annoying (e.g., Brons et al., 2014).

Using multiple microphones on each hearing aid can help in noisy conditions. The sound from the microphones is combined in a way that boosts the speech relative to the noise. This technology can be put into larger hearing aids, when there is enough spacing between the front and rear microphones.

One of the reasons why our brains are really good at picking out speech from the hubbub of a restaurant, is that it compares and contrasts the sounds from both ears. Our hearing is binaural. Similarly, if you have a hearing aids in both ears, they work better if they collaborate on reducing the noise.

Crucial to how our brains locate sound and pick out speech in noise are timing and level cues that come from comparing the sound at both ears. When sound comes from the side:

  • interaural time differences occur because the sound arrives at one ear earlier than the other.
  • interaural level differences occur because the sound has to bend around the head to reach the furthest ear.

Binaural hearing aids communicate wirelessly and use noise reduction strategies that preserve these interaural time and level difference cues (e.g., Van den Bogaert et al., 2009). This allows the listener’s brain to better locate the speech and boost this compared to the noise.

Machine learning

In recent years, there has been increasing interest in what machine learning methods can do for hearing aids. Machine learning is a branch of artificial intelligence where computers learn directly from example data. One machine learning method is the neural network. This is an algorithm formed from layers of simple computational units connected to each other in a way that is inspired by connections between neurons in the brain. Deep (3+ layer) neural networks are able to learn complex, non-linear mapping functions, which makes them ideal candidates for noise reduction tasks.

We anticipate that machine learning can help tackle the challenge of speech in noise for hearing aids, providing a tailored solution for each individual and listening situation. For example, one thing machine learning could do is to sense the acoustic environment the listener is in, and choose the most suitable processing settings.

Image via www.vpnsrus.com

In recent years, a machine learning approach for noise reduction has become popular. Neural networks are used to estimate time-frequency masks (a set of gains for each time-frequency unit that, when multiplied by the signal, produce less noisy speech; see, e.g., Zhao et al., 2018).

Machine learning systems for noise reduction are trained on artificially mixed speech and noise. Some operate on a single channel, i.e., using spectral cues, and some work with multiple channels using spatial cues. We expect that future hearing aids built on machine learning will perform best if they combine the left and right microphones to work binaurally.

Most of these noise reduction systems have been designed and evaluated in an off-line mode where they process pre-recorded signals. This isn’t much use for hearing aids that need to work in real-time with low latency (i.e., short delays). One challenge for hearing aids is to redesign off-line approaches to work quickly enough without too much loss of performance.

The potential for machine learning to produce better approaches to hearing aid processing is what motivated the Clarity Project. If you’re interested in hearing more as the challenges develop, please sign up.

References

Brons, I., Houben, R., and Dreschler, W. A. (2014). Effects of noise reduction on speech intelligibility, perceived listening effort, and personal preference in hearing-impaired listeners. Trends in hearing, 18, 1-10.

Van den Bogaert, T., Doclo, S., Wouters, J., and Moonen, M. (2009). Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. The Journal of the Acoustical Society of America, 125(1), 360-371.

Zhao, Y., Wang, D., Johnson, E. M., and Healy, E. W. (2018). A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions. The Journal of the Acoustical Society of America, 144(3), 1627-1637.

Credits

Photograph of hearing aid wearer, copyright University of Nottingham.

Image of brain with overlaid circuity made available by www.vpnsrus.com.

Why use machine learning challenges for hearing aids?

An overview of why machine learning challenges have potential to improve hearing aid signal processing.

The Clarity Project is based around the idea that machine learning challenges could improve hearing aid signal processing. After all this has happened in other areas, such as automatic speech recognition (ASR) in the presence of noise. The improvements in ASR have happened because of:

  • Machine learning (ML) at scale – big data and raw GPU power.
  • Benchmarking – research has developed around community-organised evaluations or challenges.
  • Collaboration has been enabled by these challenges, allowing working across communities such as signal processing, acoustic modelling, language modelling and machine learning

We’re hoping that these three mechanisms can drive improvements in hearing aids.

Components of a challenge

There needs to be a common task based on a target application scenario to allow communities to gain from benchmarking and collaboration. Clarity project’s first enhancement challenge will be about hearing speech from a single talker in a typical living room, where there is one source of noise and a little reverberation.

We’re currently working on developing simulation tools to allow us to generate our living room data. The room acoustic will be simulated using RAVEN and the Hearing Device Head-related Transfer Functions will come from Denk’s work. We’re working on getting better, more ecologically valid speech than is often used in speech intelligibility work.

Entrants are then given training data and development (dev) test data along with a baseline system that represents the current state-of-the-art. You can find a post and video on the current thinking on the baseline here. We’re still working on the rules stipulating what is and what is not allowed (for example, will entrants be allowed to use data from outside the challenge).

Clarity’s first enhancement challenge is focussed on maximising the speech intelligibility (SI) score. We will evaluate this first through a prediciton model that is based on a hearing loss simulation and an objective metric for speech intellibility. Simulation has been hugely important for generating training data in the CHIME challenges and so we intend to use that approach in Clarity. But results from simulated test sets cannot be trusted and hence a second evaluation will come through perceptual tests on hearing impaired subjects. However, one of our current problems is that we can’t bring listeners into our labs because of COVID-19.

We’ll actually be running two challenges in roughly parallel, because we’re also going to task the community to improve our prediction model for speech intelligibility.

We’re running a series of challenges over five years. What other scenarios should we consider? What speech? What noise? What environment? Please comment below.

Acknowledgements

Much of this text is based on Jon Barker’s 2020 SPIN keynote