Skip to main content

Task 3 - Using real noise backgrounds

Motivation

Previous Clarity challenges used target speakers in the presence of a small number of interfering sound sources. The signals were fully simulated leading to acoustic scenes that are considerably simpler than those encountered in real life. For example, previously we had either two or three interferers, interferers were always simulated as point sources, interferers were fixed to the same location throughout the scene. In this task, we are moving closer to real-world conditions by using real background noise recordings.

The task will features real audio backgrounds from everyday listening situations that are known to cause difficulty for hearing aid users: busy streets, train stations and social gatherings. The task will have a similar structure to previous Clarity challenges so that existing approaches can be readily applied. We are interested to see how well previously successful approaches are able to cope with the increased complexity of the new data. We are also interested to what innovations are needed in order to cope with the new challenges.

Figure 1. Newport train station - One of the recording locations that will feature in Task 3

Task Description

The task considers the scenario of a hearing aid user listening to a target speaker in a complex noisy environment. Three environments are being considered: beside busy roads, on railway station platforms and social gatherings. Participants will be provided with simulated hearing aid microphone inputs that have been made by artificially added target speech to real noise backgrounds. Backgrounds and impulse responses have been recorded with an ambisonic microphone so that we can also model listener head rotations.

The task design has been modelled closely on CEC2, with a 6,000 scene training set produced in the same way as the development and evaluation sets. Participants are asked to train systems using this data. They may also augment the training data using the simulation tools and simulated data from previous challenge rounds, or using any publicly available resources. We are also providing ground truth head motion data that can be used to improve system performance. We are interested in how well systems can exploit this information.

Evaluation sets will contain data that are closely matched to the training sets but using different instances of the environments. Evaluation will initially be performed using the HASPI objective metric, but this will be followed by a round of subjective listening tests with hearing-impaired listeners.

In the sections that follow we provide a detailed description of the challenge data; the rules that all systems need to follow; the evaluation metric and the performance of a baseline system. Near the submission date we will publish a final evaluation set and instructions on how to submit your signals for evaluation.