Our challenge entrants are going to use machine learning to develop better processing of speech in noise (SPIN) for hearing aids. For a machine learning algorithm to learn new ways of processing audio for the hearing impaired, it needs to estimate how the sound will be degraded by any hearing loss. Hence, we need an algorithm to simulate hearing loss for each of our listeners. The diagram belows shows our draft baseline system that was detailed in a previous blog. The hearing loss simulation is part of the prediction model. The Enhancement Model to the left is effectively the hearing aid and the Prediction Model to the right is estimating how someone will perceive the intelligibility of the speech in noise.
There are different causes of hearing loss, but we’re concentrating on the most common type that happens when you age (presbycusis). RNID (formerly Action on Hearing Loss) estimate that more than 40% of people over the age of 50 have a hearing loss, and this rises to 70% of people who are older than 70.
The aspects of hearing loss we’ve decided to simulate are
- The loss of ability to sense the quietest sounds (increase in absolute threshold).
- How as an audible sound increases in level, the perceived increase in loudness is greater than normal (loudness recruitment) (Moore et al. 1996).
- How the ear has a poorer ability to discriminate the frequency of sounds (impaired frequency selectivity).
Audio examples of hearing loss
Here are two samples of speech in noise processed through the simulator. In each audio example there are three versions of the same sentence:
- Unimpaired hearing
- Mild hearing impairment
- Moderate to severe hearing impairment
And here is an example where the noise is louder:
The hearing loss model we’re using was generously supplied by Michael Stone at the University of Manchester as MATLAB code and translated by us into Python. (We’ll be making the code available.) The original code was written by members of the Auditory Perception Group at the University of Cambridge, ca. 1991-2013, including Michael Stone, Brian Moore, Brian Glasberg and Thomas Baer. There is no one paper that describes this model, but details can be found in Baer and Moore (1993 and 1994), Moore and Glasberg (1993), and Nejime and Moore (1998).
The original speech recordings come from the ARU corpus, University of Liverpool (Hopkins et al. 2019). This corpus is freely available at the link in the reference below.
Baer, T., & Moore, B. C. (1993). Effects of spectral smearing on the intelligibility of sentences in noise. The Journal of the Acoustical Society of America, 94(3), 1229-1241.
Baer, T., & Moore, B. C. (1994). Effects of spectral smearing on the intelligibility of sentences in the presence of interfering speech. The Journal of the Acoustical Society of America, 95(4), 2277-2280.
Hopkins, C., Graetzer, S., & Seiffert, G. (2019). ARU adult British English speaker corpusof IEEE sentences(ARU speech corpus) version 1.0 [data collection]. Acoustics Research Unit, School of Architecture, University of Liverpool, United Kingdom. DOI: 10.17638/datacat.liverpool.ac.uk/681. Retrieved from http://datacat.liverpool.ac.uk/681/.
Moore, B. C., & Glasberg, B. R. (1993). Simulation of the effects of loudness recruitment and threshold elevation on the intelligibility of speech in quiet and in a background of speech. The Journal of the Acoustical Society of America, 94(4), 2050-2062.
Moore, B. C., Glasberg, B. R., & Vickers, D. A. (1996). Factors influencing loudness perception in people with cochlear hearing loss. B. Kollmeier, World Scientific, Singapore, 7-18.
Nejime, Y., & Moore, B. C. (1998). Evaluation of the effect of speech-rate slowing on speech intelligibility in noise using a simulation of cochlear hearing loss. The Journal of the Acoustical Society of America, 103(1), 572-576.