What our baseline hearing aid simulates, with examples.
Our challenge entrants are going to use machine learning to develop better hearing aid processing for listening to speech in noise (SPIN). We’ll provide a baseline hearing aid model for entrants to improve on. The figure below shows our baseline system, where the yellow box to the left is where the simulated hearing aid sits (labelled “Enhancement model”).
We decided to base our simulated hearing aid on the open Master Hearing Aid (openMHA), which is an open-source software platform for real-time audio signal processing. This was developed by the University of Oldenburg, HörTech gGmbH, Oldenburg, and the BatAndCat Corporation, USA. The original version was developed as one of the outcomes of the Cluster of Excellence Hearing4all project. The openMHA platform includes:
- a software development kit (C/C++ SDK) including an extensive signal processing library for algorithm development and a set of Matlab and Octave tools to support development and off-line testing
- real-time runtime environments for standard PC platforms and mobile ARM platforms
- a set of baseline reference algorithms that forms a complete hearing aid system (multi-band dynamic compression and amplification, directional microphones, binaural beamformers and coherence filters, single-channel noise reduction, feedback control).
We have written a Python wrapper for the core openMHA system for ease of use within machine learning frameworks. We developed a generic hearing aid configuration and translated the Camfit compressive fitting, the prescription that takes a listener’s audiogram and determines the right settings for the hearing aid, based on Moore et al. 1999 and encoded by openMHA.
Some aspects of modern digital hearing aids that we’ve decided to simulate are:
- differential microphones with a cardioid pattern, and
- a multiband compressor for dynamic compression.
We’ve decided not to simulate the following on the basis that all these tend to be implemented in proprietary forms, such that we can’t replicate them exactly in our open-source algorithm:
- coordination of gross processing parameters across ears,
- binaural processing involving some degree of signal exchange between left and right devices,
- gain changes influenced by speech-to-noise ratio estimators,
- frequency shifting or scaling, and
- dual or adaptive time-constant wide dynamic range compression.
We are using the Oldenburg Hearing Device (OlHeaD) Head Related Transfer Function (HRTF) Database (Denk et al. 2018) to replicate the signals that would be received by the front and rear microphones of the hearing aid and also at the eardrums of the wearer.
Audio examples of hearing aid processing
Here is an example of speech in noise processed by the simulated hearing aid for a moderate level of hearing loss. We can hear that the shape of the frequency spectrum has been modified to suit the listener’s specific pattern of hearing loss.
Information about our hearing loss model can be found here.
The target speech comes from our new 40 speaker British English speech database, while the speech interferer noise comes from the SLR83 database, which comprises recordings of male and female speakers of English from various parts of the UK and Ireland.
We are grateful to the developers of the openMHA platform for the use of their software. Special thanks are due to Hendrik Kayser and Tobias Herzke. We are also grateful to Brian Moore, Michael Stone and colleagues for the Camfit compressive prescription, and to the people involved in the preparation of the OlHead HRTF (particularly Florian Denk) and SLR83 databases. The feature image is taken from Denk et al. (2018).
Demirsahin, I., Kjartansson, O., Gutkin, A., & Rivera, C. E. (2020). Open-source Multi-speaker Corpora of the English Accents in the British Isles. Available at http://www.openslr.org/83/
Denk, F., Ernst, S. M., Ewert, S. D., & Kollmeier, B. (2018). Adapting hearing devices to the individual ear acoustics: Database and target response correction functions for various device styles. Trends in Hearing, 22, 2331216518779313.
Moore, B. C. J., Alcántara, J. I., Stone, M. A., & Glasberg, B. R. (1999). Use of a loudness model for hearing aid fitting: II. Hearing aids with multi-channel compression. British Journal of Audiology, 33(3), 157-170.
How hearing aids address the problem of speech-in-noise in noisy and quieter places. We’ll also discuss what machine learning techniques are often used for noise reduction, and some promising strategies for hearing aids.
In a previous blog, we set out the problem of using hearing aids to pick out speech in noisy places. When the signal-to-noise ratio (SNR) is low, hearing aids can only do so much to improve the intelligibility of the speech.
A solitary hearing aid has various ways of addressing everyday constant noises such as cars, vacuum cleaners and fans. The aids work best when the noise is not too intrusive and SNR is relatively high. Problems arise when the noise is high (low SNRs), because then the hearing aid processing can distort the sound too much. While the hearing aid might have limited success in improving intelligibility in certain cases, they can still make the noise less annoying (e.g., Brons et al., 2014).
Using multiple microphones on each hearing aid can help in noisy conditions. The sound from the microphones is combined in a way that boosts the speech relative to the noise. This technology can be put into larger hearing aids, when there is enough spacing between the front and rear microphones.
One of the reasons why our brains are really good at picking out speech from the hubbub of a restaurant, is that it compares and contrasts the sounds from both ears. Our hearing is binaural. Similarly, if you have a hearing aids in both ears, they work better if they collaborate on reducing the noise.
Crucial to how our brains locate sound and pick out speech in noise are timing and level cues that come from comparing the sound at both ears. When sound comes from the side:
- interaural time differences occur because the sound arrives at one ear earlier than the other.
- interaural level differences occur because the sound has to bend around the head to reach the furthest ear.
Binaural hearing aids communicate wirelessly and use noise reduction strategies that preserve these interaural time and level difference cues (e.g., Van den Bogaert et al., 2009). This allows the listener’s brain to better locate the speech and boost this compared to the noise.
In recent years, there has been increasing interest in what machine learning methods can do for hearing aids. Machine learning is a branch of artificial intelligence where computers learn directly from example data. One machine learning method is the neural network. This is an algorithm formed from layers of simple computational units connected to each other in a way that is inspired by connections between neurons in the brain. Deep (3+ layer) neural networks are able to learn complex, non-linear mapping functions, which makes them ideal candidates for noise reduction tasks.
We anticipate that machine learning can help tackle the challenge of speech in noise for hearing aids, providing a tailored solution for each individual and listening situation. For example, one thing machine learning could do is to sense the acoustic environment the listener is in, and choose the most suitable processing settings.
In recent years, a machine learning approach for noise reduction has become popular. Neural networks are used to estimate time-frequency masks (a set of gains for each time-frequency unit that, when multiplied by the signal, produce less noisy speech; see, e.g., Zhao et al., 2018).
Machine learning systems for noise reduction are trained on artificially mixed speech and noise. Some operate on a single channel, i.e., using spectral cues, and some work with multiple channels using spatial cues. We expect that future hearing aids built on machine learning will perform best if they combine the left and right microphones to work binaurally.
Most of these noise reduction systems have been designed and evaluated in an off-line mode where they process pre-recorded signals. This isn’t much use for hearing aids that need to work in real-time with low latency (i.e., short delays). One challenge for hearing aids is to redesign off-line approaches to work quickly enough without too much loss of performance.
The potential for machine learning to produce better approaches to hearing aid processing is what motivated the Clarity Project. If you’re interested in hearing more as the challenges develop, please sign up.
Brons, I., Houben, R., and Dreschler, W. A. (2014). Effects of noise reduction on speech intelligibility, perceived listening effort, and personal preference in hearing-impaired listeners. Trends in hearing, 18, 1-10.
Van den Bogaert, T., Doclo, S., Wouters, J., and Moonen, M. (2009). Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. The Journal of the Acoustical Society of America, 125(1), 360-371.
Zhao, Y., Wang, D., Johnson, E. M., and Healy, E. W. (2018). A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions. The Journal of the Acoustical Society of America, 144(3), 1627-1637.
Photograph of hearing aid wearer, copyright University of Nottingham.
Image of brain with overlaid circuity made available by www.vpnsrus.com.
What our hearing loss algorithms simulate, with audio examples to illustrate hearing loss.
Our challenge entrants are going to use machine learning to develop better processing of speech in noise (SPIN) for hearing aids. For a machine learning algorithm to learn new ways of processing audio for the hearing impaired, it needs to estimate how the sound will be degraded by any hearing loss. Hence, we need an algorithm to simulate hearing loss for each of our listeners. The diagram belows shows our draft baseline system that was detailed in a previous blog. The hearing loss simulation is part of the prediction model. The Enhancement Model to the left is effectively the hearing aid and the Prediction Model to the right is estimating how someone will perceive the intelligibility of the speech in noise.
There are different causes of hearing loss, but we’re concentrating on the most common type that happens when you age (presbycusis). Action on Hearing Loss estimate that more than 40% of people over the age of 50 year have a hearing loss, and this rises to 70% of people who are older than 70.
The aspects of hearing loss we’ve decided to simulate are:
- The loss of ability to sense the quietest sounds (increase in absolute threshold).
- How as an audible sound increases in level, the perceived increase in loudness is greater than normal (loudness recruitment) (Moore et al. 1996).
- How the ear has a poorer ability to discriminate the frequency of sounds (spectral smearing).
Audio examples of hearing loss
Here are two samples of speech in noise processed through the simulator. In each audio example there are three versions of the same sentence:
- Unimpaired hearing
- Mild hearing impairment
- Moderate to severe hearing impairment
And here is an example where the noise is louder:
The hearing loss model we’re using was generously supplied by Michael Stone at the University of Manchester as MATLAB code and translated by us into Python. (We’ll be making the code available.) The original code was written by members of the Auditory Perception Group at the University of Cambridge, ca. 1991-2013, including Michael Stone, Brian Moore, Brian Glasberg and Thomas Baer. There is no one paper that describes this model, but details can be found in Baer and Moore (1993 and 1994), Moore and Glasberg (1993), and Nejime and Moore (1998).
The original speech recordings come from the ARU corpus, University of Liverpool (Hopkins et al. 2019). This corpus is freely available at the link in the reference below.
Baer, T., & Moore, B. C. (1993). Effects of spectral smearing on the intelligibility of sentences in noise. The Journal of the Acoustical Society of America, 94(3), 1229-1241.
Baer, T., & Moore, B. C. (1994). Effects of spectral smearing on the intelligibility of sentences in the presence of interfering speech. The Journal of the Acoustical Society of America, 95(4), 2277-2280.
Hopkins, C., Graetzer, S., & Seiffert, G. (2019). ARU adult British English speaker corpusof IEEE sentences(ARU speech corpus) version 1.0 [data collection]. Acoustics Research Unit, School of Architecture, University of Liverpool, United Kingdom. DOI: 10.17638/datacat.liverpool.ac.uk/681. Retrieved from http://datacat.liverpool.ac.uk/681/.
Moore, B. C., & Glasberg, B. R. (1993). Simulation of the effects of loudness recruitment and threshold elevation on the intelligibility of speech in quiet and in a background of speech. The Journal of the Acoustical Society of America, 94(4), 2050-2062.
Moore, B. C., Glasberg, B. R., & Vickers, D. A. (1996). Factors influencing loudness perception in people with cochlear hearing loss. B. Kollmeier, World Scientific, Singapore, 7-18.
Nejime, Y., & Moore, B. C. (1998). Evaluation of the effect of speech-rate slowing on speech intelligibility in noise using a simulation of cochlear hearing loss. The Journal of the Acoustical Society of America, 103(1), 572-576.
People often have problems understanding speech in noise, and this is one of the main deficits of hearing aids that our machine learning challenges will address.
It’s common for us to hear sounds coming simultaneously from different sources. Our brains then need to separate out what we want to hear (the target speaker) from the other sounds. This is especially difficult when the competing sounds are speech. This has the quaint name, The Cocktail Party Problem (Cherry, 1953). We don’t go to many cocktail parties, but we encounter lots of times where the The Cocktail Party Problem is important. Hearing a conversation in a busy restaurant, trying to understand a loved one while the television is on or hearing the radio in the kitchen when the kettle is boiling, are just a few examples.
Difficulty in picking out speech in noise is really common if you have a hearing loss. Indeed, it’s often when people have problems doing this that they realise they have a hearing loss.
“Hearing aids don’t work when there is a lot of background noise. This is when you need them to work.”Statement from a hearing aid wearer (Kochkin, 2000)
Hearing aids are the the most common form of treatment for hearing loss. However, surveys indicate that at least 40% of hearing aids are never or rarely used (Knudsen et al., 2010). A major reason for this is dissatisfaction with performance. Even the best hearing aids perform poorly for speech in noise. This is particularly the case when there are many people talking at the same time, and when the amount of noise is relatively high (i.e., the signal-to-noise ratio (SNR) is low). As hearing ability worsen with age, the ability to understand speech in background noise also reduces (e.g., Akeroyd, 2008).
When an audiologist assesses hearing loss, one thing they measure is the pure tone audiogram. This assesses the quietest sound someone can hear over a range of frequencies. However, an audiogram only partly explains your experience with speech in background noise (Heinrich et al. 2015), because it only measures the quietest sound you can hear. For example, picking out speech from noise is a complex task for the brain to perform, and this cognitive ability isn’t assessed by an audiogram. In addition, there are other factors that are important such as personality, motivation, attitude toward hearing aids and prior hearing aid experience.
Speech-in-noise tests get closer to the real-life problem a hearing aid is trying to solve. Listeners listen to speech in the presence of noise and write down what words they hear. More words correct show an increase in the ability to understand speech in specific noisy situations when listeners are wearing their hearing aid (aided) relative to when they are not (unaided). Of course, listening conditions in the clinic differ from real-life conditions.
Currently, while speech-in-noise test scores can be useful when fine-tuning a hearing aid, even then many users are disappointed about the performance of their hearing aids. Through our challenges, we hope to improve this situation, whether you go to cocktail parties or not.
What’s your experience with speech in noise? Please comment below.
Akeroyd, M. A. (2008). Are individual differences in speech reception related to individual differences in cognitive ability? A survey of twenty experimental studies with normal and hearing-impaired adults. International Journal of Audiology, 47(sup2), S53-S71.
Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5), 975-979.
Heinrich, A., Henshaw, H., and Ferguson, M. A. (2015). The relationship of speech intelligibility with hearing sensitivity, cognition, and perceived hearing difficulties varies for different speech perception tests. Frontiers in Psychology, 6, 782.
Vestergaard Knudsen, L., Öberg, M., Nielsen, C., Naylor, G., and Kramer, S. E. (2010). Factors influencing help seeking, hearing aid uptake, hearing aid use and satisfaction with hearing aids: A review of the literature. Trends in Amplification, 14(3), 127-154.
Kochkin, S. (2000). MarkeTrak V: “Why my hearing aids are in the drawer” The consumers’ perspective. The Hearing Journal, 53(2), 34-36.
- Photo of Cocktail party by Ross CC BY-NC-SA 2.0
- Ronan, N., & Barrett, G. (2014). A 68 year old woman with deteriorating hearing. BMJ, 348, g2984. https://doi-org.salford.idm.oclc.org/10.1136/bmj.g2984