Obtaining the CPC1 dataset
The challenge data is now published and available for download on Zenodo. On the Zenodo site you will find the following.
- The challenge training set data packaged as a single 14 GB file, clarity_CPC1_data.v1_1.tgz
- The evaluation set data packaged as a single 6.5 GB file, clarity_CPC1_data.test.v1.tgz. The evaluation data should be untarred into the same root as the training data.
- Evaluation set labels packaged as a small 130 KB file, clarity_CPC1_data.labels.v1_0.tar.gz. These were not made available to entrants during the challenge but have been released subsequently to allow self-evaluation.
The Github repository containing the baseline code is here. The repository contains code for all the Clarity enhancement and prediction challenges. You will find all the necessary instructions for installing the data and setting up the baseline system: i.e. running the MSBG hearing loss model and MBSTOI intelligibility prediction stage. We will be making a further small release in early December to specify the final evaluation metrics that we will be using to rank entries.
The Challenge is now closed but the data is still available for anyone to use. If using the data please cite the following paper
Jon Barker and Michael Akeroyd and Trevor J. Cox and John F. Culling and Jennifer Firth and Simone Graetzer and Holly Griffiths and Lara Harris and Graham Naylor and Zuzanna Podwinska and Eszter Porter and Rhoddy Viveros Munoz, βThe 1st Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction,β in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, Incheon, South Korea, 2022.