Task 1 Baseline
A baseline system for all of the tasks is provided as part of the latest release of the PyClarity toolkit available on GitHub. For this challenge round we ask entrants to use latest packaged release in the v0.5.x series.
The baseline takes the form of a pipeline of three Python scripts,
enhance.py
evaluate.py
report_scores.py
The first of these can be replaced by your own enhancement system, while the other two scripts should remain fixed to match how we will evaluate final submissions. For the development set, the reference signals have been provided so you can run the complete pipeline and obtain scores. For the final evaluation set, we will not release references, but instead, you will send us the enhanced signals and we will score them remotely using the same scripts.
Enhancement
The enhance.py
script performs the baseline enhancement. The baseline simply takes the 6-channel hearing aid inputs and reduces this to a stereo hearing aid output by passing through the 'front' microphone signal of the left and right ear.
The stereo pair is then passed through a hearing aid amplification stage using a NAL-R [1] fitting amplification and a simple automatic gain compressor. The amplification is determined by the audiograms defined by the scene-listener pairs in clarity_data/metadata/scenes_listeners.dev.json for the development set. After amplification, the evaluate function calculates the better-ear HASPI [2].
Evaluation
Once the enhancement has been run, the evaluate.py
script can compute the HASPI scores for the signals stored in the amplified_signals folder. The script will read the scene-listener pairs from the development set and calculate the HASPI score for each pair. The final score is the mean HASPI score across all pairs.
Once the enhancement has been run, the evaluate.py
script can compute the HASPI scores for the signals stored in the amplified_signals folder. The script will read the scene-listener pairs from the development set and calculate the HASPI score for each pair. The final score is the mean HASPI score across all pairs.
The results are stored in a CSV file which is then read by the final report_scores.py
script which will generate a report. This two step process means that it is easy to run evaluate on multiple processors on subsets of the full evaluation set: each processes will produce a separate CSV and report_score.py
will collate the results, check their integrity and generate the final report.
The scripts have been designed to run with minimal configuration, but with flexible options for performing partial runs, parallel processing, or running on a cluster. For full documentation and examples of how to run the scripts see the README.md file in the CEC3 baseline recipe of the PyClarity toolkit on GitHub.
For examples of how to run the script see the README.md file in the CEC3 baseline recipe of the PyClarity toolkit on GitHub.
Baseline performance for Task 1
Running all three scripts on Task 2 will lead to the following output:
Evaluation set size: 7500
Mean HASPI score: 0.22178678134846783
SNR haspi
SNR
(-12, -9] -10.498088 0.052545
(-9, -6] -7.541468 0.080589
(-6, -3] -4.477046 0.143096
(-3, 0] -1.432494 0.239527
(0, 3] 1.470118 0.352110
(3, 6] 4.492380 0.477001
The mean HASPI score (0.186) is the metric that will be used for ranking. The table shows the mean HASPI score for each SNR range to help you understand the performance of your system.
References
- Byrne, Denis, and Harvey Dillon. "The National Acoustic Laboratories'(NAL) new procedure for selecting the gain and frequency response of a hearing aid." Ear and hearing 7.4 (1986): 257-265.
- Kates, J.M. and Arehart, K.H., 2021. The hearing-aid speech perception index (HASPI) version 2. Speech Communication, 131, pp.35-46.