Skip to main content

Obtaining the CPC3 dataset

The challenge data is now published and available for download on Zenodo. On the Zenodo site you will find the following.

  • The challenge training set data packaged as a single 7.5 GB file, clarity_CPC3_data.v1_1.tar.gz
  • The development set data packaged as a single 752 MB file, clarity_CPC3_data.dev.v1_0.tar.gz.
  • The evaluation set data packaged as a single 6.1 GB file, clarity_CPC3_data.eval.v1_0.tar.gz.
  • Development and Evaluation set labels packaged as a small 515 KB file, clarity_CPC3_data.labels.tar.gz. The evaluation set labels were not made available to entrants during the challenge but have been released subsequently to allow self-evaluation.
  • A small 20 MB demo dataset for preview purposes, clarity_CPC3_demo_data.v1_0.tar.gz.

All packages should be unpacked under the same root.

The Github repository containing the baseline code is here. The repository contains code for all the Clarity enhancement and prediction challenges. You will find all the necessary instructions for installing the data and setting up the baseline system: i.e. producing the better-ear HASPI predictions.

info

The Challenge is now closed but the data is still available for anyone to use. If using the data please cite the Zenodo dataset:

Barker, J. (2025). 3rd Clarity Prediction Challenge (CPC3) dataset for hearing aid speech intelligibility prediction [Data set]. Zenodo. <https://doi.org/10.5281/zenodo.17039000>;

We are preparing a paper describing the dataset and summarising the challenge outcomes which will provide a more suitable reference. When this paper is available we will update the citation details.