Hearing aid simulation

What our baseline hearing aid simulates, with examples.

Our challenge entrants are going to use machine learning to develop better hearing aid processing for listening to speech in noise (SPIN). We’ll provide a baseline hearing aid model for entrants to improve on. The figure below shows our baseline system, where the yellow box to the left is where the simulated hearing aid sits (labelled “Enhancement model”).

This image has an empty alt attribute; its file name is baseline-1024x456.png
The draft baseline system (where SPIN is speech in noise, HL is hearing loss, SI is speech intelligibility, and L & R are Left and Right).

We decided to base our simulated hearing aid on the open Master Hearing Aid (openMHA), which is an open-source software platform for real-time audio signal processing. This was developed by the University of Oldenburg, HörTech gGmbH, Oldenburg, and the BatAndCat Corporation, USA. The original version was developed as one of the outcomes of the Cluster of Excellence Hearing4all project. The openMHA platform includes:

  • a software development kit (C/C++ SDK) including an extensive signal processing  library for algorithm development and a set of Matlab and Octave tools to support development and off-line testing
  • real-time runtime environments for standard PC platforms and mobile ARM platforms
  • a set of baseline reference algorithms that forms a complete hearing aid system, including multi-band dynamic compression and amplification, directional microphones,  binaural beamformers and coherence filters, single-channel noise reduction, and feedback control.

We have written a Python wrapper for the core openMHA system for ease of use within machine learning frameworks. We developed a simple generic hearing aid configuration and translated the Camfit compressive fitting, the prescription that takes a listener’s audiogram and determines the right settings for the hearing aid, based on Moore et al. 1999 and encoded by openMHA.

Some aspects of modern digital hearing aids that we’ve decided to simulate are:

  • differential microphones, and
  • a multiband compressor for dynamic compression.

We’ve decided not to simulate the following on the basis that all these tend to be implemented in proprietary forms, such that we can’t replicate them exactly in our open-source algorithm:

  • coordination of gross processing parameters across ears,
  • binaural processing involving some degree of signal exchange between left and right devices,
  • gain changes influenced by speech-to-noise ratio estimators,
  • frequency shifting or scaling, and
  • dual or adaptive time-constant wide dynamic range compression.

We are using the Oldenburg Hearing Device (OlHeaD) Head Related Transfer Function (HRTF) Database (Denk et al. 2018) to replicate the signals that would be received by the front and rear microphones of the hearing aid and also at the eardrums of the wearer.

Audio examples of hearing aid processing

Here is an example of speech in noise processed by the simulated hearing aid for a moderate level of hearing loss. We can hear that the shape of the frequency spectrum has been modified to suit the listener’s specific pattern of hearing loss.

Here’s the original noisy signal where the noise is generated by a washing machine.
Here’s the same signal processed by the simulated hearing aid for a listener with a moderate level of hearing loss (Pure Tone Average of 38). For illustration purposes, this is presented here at an overall level that is similar to that of the original signal.
Here’s the noisy signal as it would be perceived by the listener wearing the hearing aid. Without the aid, the original noisy signal would be near inaudible.

Information about our hearing loss model can be found here.

The target speech comes from our new 40 speaker British English speech database, while the speech interferer noise comes from the SLR83 database, which comprises recordings of male and female speakers of English from various parts of the UK and Ireland.

Acknowledgements

We are grateful to the developers of the openMHA platform for the use of their software. Special thanks are due to Hendrik Kayser and Tobias Herzke. We are also grateful to Brian Moore, Michael Stone and colleagues for the Camfit compressive prescription, and to the people involved in the preparation of the OlHead HRTF (particularly Florian Denk) and SLR83 databases. The feature image is taken from Denk et al. (2018).

References

Demirsahin, I., Kjartansson, O., Gutkin, A., & Rivera, C. E. (2020). Open-source Multi-speaker Corpora of the English Accents in the British Isles. Available at http://www.openslr.org/83/

Denk, F., Ernst, S. M., Ewert, S. D., & Kollmeier, B. (2018). Adapting hearing devices to the individual ear acoustics: Database and target response correction functions for various device styles. Trends in Hearing, 22, 2331216518779313.

Moore, B. C. J., Alcántara, J. I., Stone, M. A., & Glasberg, B. R. (1999). Use of a loudness model for hearing aid fitting: II. Hearing aids with multi-channel compression. British Journal of Audiology, 33(3), 157-170.

One approach to our enhancement challenge

Improving hearing aid processing using DNNs blog. A suggested approach to overcome the non-differentiable loss function.

The aim of our Enhancement Challenge is to get people producing new algorithms for processing speech signals through hearing aids. We expect most entries to replace the classic hearing aid processing of Dynamic Range Compressors (DRCs) with deep neural networks (DNN) (although all approaches are welcome!). The first round of the challenge is going to be all about improving speech intelligibility.

Setting up a DNN structure and training regime for the task is not as straightforward as it might first appear. Figure 1 shows an example of a naive training regime. An audio example of Speech in Noise (SPIN) is randomly created (audio sample generation, bottom left), and a listener is randomly selected with particular hearing loss characteristics (random artificial listener generation, top left). The DNN Enhancement model (represented by the bright yellow box) then produces improved speech in noise. (Audio signals in pink are two-channel, left and right because this is for binaural hearing aids.)

Figure 1

Next the improved speech in noise is passed to the Prediction Model in the lime green box, and this gives an estimation of the Speech Intelligibility (SI). Our baseline system will include algorithms for this. We’ve already blogged about the Hearing Loss Simulation. Our current thinking is that the intelligibility model will be using a binaural form of the Short-Time Objective Intelligibility Index (STOI) [1]. The dashed line going back to the enhancement model shows that the DNN will be updated based on the reciprocal of the Speech Intelligibility (SI) score. By minimising (1/SI), the enhancement model will be maximising intelligibility.

The difficulty here is that updating the Enhancement Model DNN during training requires the error to be known at the DNN’s output (the point labelled “improved SPIN”). But we don’t know this, we only know the error on the output of the prediction model at the far right of the diagram. This wouldn’t be a problem if the prediction model could be inverted, because we could then run the 1/SI error backwards through the inverse model.

As the inverse of the prediction model isn’t available, one solution is to train another DNN to mimic its behaviour (Figure 2). As this new Prediction Model is a DNN, the 1/SI error can be passed backwards through it using standard neural network training formulations.

This DNN prediction model could be trained first using knowledge distillation (this is something I’ve previous done for a speech intelligibility model), and then the weights frozen while the Enhancement Model is trained. But there is a ‘chicken and egg’ problem here. The difficulty is generating all the training data for the prediction model. Until you train the enhancement model, you won’t have a representative examples of “improved SPIN” to train the prediction model. But without the prediction model, you can’t train the enhancement model.

One solution is to train the two DNNs in tandem, with an approach analogous to how pairs of networks are trained in a Generative Adversarial Network (GAN). iMetricGan developed by Li et al. [2] is an example of this being done for speech enhancement, although the authors weren’t trying to include hearing loss simulation. They aren’t the only ones looking at trying to solve problems where a non-differentiable or black-box evaluation function is in the way of DNN training [3][4].

We hope our entrants will come up with lots of other ways of overcoming this problem. How would you tackle it?

References

[1] Andersen, A.H., Haan, J.M.D., Tan, Z.H. and Jensen, J., 2015. A binaural short time objective intelligibility measure for noisy and enhanced speech. In the Sixteenth Annual Conference of the International Speech Communication Association.

[2] Li, H., Fu, S.W., Tsao, Y. and Yamagishi, J., 2020. iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning. arXiv preprint arXiv:2004.00932.

[3] Gillhofer, M., Ramsauer, H., Brandstetter, J., Schäfl, B. and Hochreiter, S., 2019. A GAN based solver of black-box inverse problems. Proceedings of the NeurIPS 2019 Workshop.

[4] Kawanaka, M., Koizumi, Y., Miyazaki, R. and Yatabe, K., 2020, May. Stable training of DNN for speech enhancement based on perceptually-motivated black-box cost function. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7524-7528). IEEE.

The speech-in-noise problem

People often have problems understanding speech in noise, and this is one of the main deficits of hearing aids that our machine learning challenges will address.

It’s common for us to hear sounds coming simultaneously from different sources. Our brains then need to separate out what we want to hear (the target speaker) from the other sounds. This is especially difficult when the competing sounds are speech. This has the quaint name, The Cocktail Party Problem (Cherry, 1953). We don’t go to many cocktail parties, but we encounter lots of times where the The Cocktail Party Problem is important. Hearing a conversation in a busy restaurant, trying to understand a loved one while the television is on or hearing the radio in the kitchen when the kettle is boiling, are just a few examples.

Difficulty in picking out speech in noise is really common if you have a hearing loss. Indeed, it’s often when people have problems doing this that they realise they have a hearing loss.

“Hearing aids don’t work when there is a lot of background noise. This is when you need them to work.”

Statement from a hearing aid wearer (Kochkin, 2000)

Hearing aids are the the most common form of treatment for hearing loss. However, surveys indicate that at least 40% of hearing aids are never or rarely used (Knudsen et al., 2010). A major reason for this is dissatisfaction with performance. Even the best hearing aids perform poorly for speech in noise. This is particularly the case when there are many people talking at the same time, and when the amount of noise is relatively high (i.e., the signal-to-noise ratio (SNR) is low). As hearing ability worsen with age, the ability to understand speech in background noise also reduces (e.g., Akeroyd, 2008).

When an audiologist assesses hearing loss, one thing they measure is the pure tone audiogram. This assesses the quietest sound someone can hear over a range of frequencies. However, an audiogram only partly explains your experience with speech in background noise (Heinrich et al. 2015), because it only measures the quietest sound you can hear. For example, picking out speech from noise is a complex task for the brain to perform, and this cognitive ability isn’t assessed by an audiogram. In addition, there are other factors that are important such as personality, motivation, attitude toward hearing aids and prior hearing aid experience.

An audiogram displaying a “ski slope” pattern that is a sign of age-related hearing loss (source: Ronan and Barrett, BMJ, 2014).

Speech-in-noise tests get closer to the real-life problem a hearing aid is trying to solve. Listeners listen to speech in the presence of noise and write down what words they hear. More words correct show an increase in the ability to understand speech in specific noisy situations when listeners are wearing their hearing aid (aided) relative to when they are not (unaided). Of course, listening conditions in the clinic differ from real-life conditions.

Currently, while speech-in-noise test scores can be useful when fine-tuning a hearing aid, even then many users are disappointed about the performance of their hearing aids. Through our challenges, we hope to improve this situation, whether you go to cocktail parties or not.

What’s your experience with speech in noise? Please comment below.

References

Akeroyd, M. A. (2008). Are individual differences in speech reception related to individual differences in cognitive ability? A survey of twenty experimental studies with normal and hearing-impaired adults. International Journal of Audiology, 47(sup2), S53-S71.

Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5), 975-979.

Heinrich, A., Henshaw, H., and Ferguson, M. A. (2015). The relationship of speech intelligibility with hearing sensitivity, cognition, and perceived hearing difficulties varies for different speech perception tests. Frontiers in Psychology, 6, 782.

Vestergaard Knudsen, L., Öberg, M., Nielsen, C., Naylor, G., and Kramer, S. E. (2010). Factors influencing help seeking, hearing aid uptake, hearing aid use and satisfaction with hearing aids: A review of the literature. Trends in Amplification, 14(3), 127-154.

Kochkin, S. (2000). MarkeTrak V: “Why my hearing aids are in the drawer” The consumers’ perspective. The Hearing Journal, 53(2), 34-36.

Credits