Skip to main content

Task 2 - Using real hearing aid recordings


Previous Clarity challenges have used fully simulated signals to train and evaluate systems. Not only does this make the task easier because the training and evaluation data are closely matched, it has also led to data that missing many of the complexities of real-world signals. In this task, we are moving closer to real world conditions by using signals that have been live-recorded over real microphones of a hearing aid worn by a participant listening to Clarity scenes reproduced over loudspeakers. This captures aspects of the problem that were previously neglected including complex head motions, real room acoustics, and real microphone characteristics. We are interested in how well systems can cope with this more challenging data.

Figure 1. Data is recorded over real microphones (left) and is accompanied by accurate head motion tracked using reflective markers (right)

Task Description

The scenario is a hearing aid user listening to a target speaker in a domestic living room while two or three interfering sounds are also active. Participants will be provided with signals recorded over left and right hearing aid shells each with three microphones. These devices were worn by a participant who was seated in front of loudspeakers that were reproducing scenes similar to those used in previous Clarity challenges. Loudspeakers were moved between recording sessions in order to produce a large set of spatial configurations.

Figure 2. A domestic scenario for Task 2: one talker, a listener who rotates their head, and at least two sources of unwanted sound.

The task design has been modelled closely on CEC2, with a 6,000 scene training set recorded in the same way as the development and evaluation sets. Participants are asked to training systems using this data. They may also augment the training data using the simulation tools and simulated data from previous challenge rounds, or using any publicly available resources. We are also providing ground truth head motion data that can be used to improve system performance. We are interested in how well systems can exploit this information.

Evaluation sets will contain data that are closely matched to the training sets that will be used for ranking systems. However, we will also publish some additional ‘surprise’ evaluation sets that are deliberately mismatched in a number of ways (listener head size, room T60 time, source location distribution). These will be used to study how well systems generalise outside of the training domain. Evaluation will initially be performed using the HASPI objective metric, but this will be followed by a round of subjective listening tests with hearing-impaired listeners.

Fig 3. Schematic of the challenge baseline.

In the sections that follow we provide a detailed description of the challenge data; the rules that all systems need to follow; the evaluation metric and the performance of a baseline system. Near the submission date we will publish a final evaluation set and instructions on how to submit your signals for evaluation.