recipes.cad_icassp_2026.baseline package

Submodules

recipes.cad_icassp_2026.baseline.compute_stoi module

Compute the STOI scores.

recipes.cad_icassp_2026.baseline.compute_stoi.compute_single_stoi(reference: ndarray, processed: ndarray, fsamp: int, stoi_fsamp: int = 10000) float[source]

Compute the STOI score between a reference and processed signal.

Parameters:
  • reference (np.ndarray) – Reference signal.

  • processed (np.ndarray) – Processed signal.

  • fsamp (int) – Sampling frequency.

  • stoi_fsamp (int) – Sampling frequency for STOI computation. Default is 10000 Hz.

Returns:

STOI score.

Return type:

float

recipes.cad_icassp_2026.baseline.compute_stoi.compute_stoi_for_signal(cfg: DictConfig, record: dict, data_root: str, estimated_vocals: ndarray) float[source]

Compute the stoi score for a given signal.

Parameters:
  • cfg (DictConfig) – configuration object

  • record (dict) – the metadata dict for the signal

  • data_root (str) – root path to the dataset

  • estimated_vocals (np.ndarray) – estimated vocals signal

Returns:

stoi score

Return type:

float

recipes.cad_icassp_2026.baseline.compute_stoi.run_compute_stoi(cfg: DictConfig) None[source]

Run the STOI score computation.

recipes.cad_icassp_2026.baseline.compute_whisper module

recipes.cad_icassp_2026.baseline.evaluate module

Evaluate the predictions against the ground truth correctness values

recipes.cad_icassp_2026.baseline.evaluate.compute_scores(predictions, labels) dict[source]

Compute the scores for the predictions

recipes.cad_icassp_2026.baseline.evaluate.evaluate(cfg: DictConfig) None[source]

Evaluate the predictions against the ground truth correctness values

recipes.cad_icassp_2026.baseline.evaluate.kt_score(x: ndarray, y: ndarray) float[source]

Compute the Kendall’s tau correlation between two arrays

recipes.cad_icassp_2026.baseline.evaluate.ncc_score(x: ndarray, y: ndarray) float[source]

Compute the normalized cross correlation between two arrays

recipes.cad_icassp_2026.baseline.evaluate.rmse_score(x: ndarray, y: ndarray) float[source]

Compute the root mean squared error between two arrays

recipes.cad_icassp_2026.baseline.evaluate.std_err(x: ndarray, y: ndarray) float[source]

Compute the standard error between two arrays

recipes.cad_icassp_2026.baseline.predict module

Make intelligibility predictions from HASPI scores.

recipes.cad_icassp_2026.baseline.predict.predict_dev(cfg: DictConfig)[source]

Predict intelligibility for baselines.

Set config.baseline to `stoi` or `whisper_mixture` or `whisper_vocals` depending on which baseline you want to run.

recipes.cad_icassp_2026.baseline.shared_predict_utils module

Shared utilities for STOI baseline prediction experiments.

class recipes.cad_icassp_2026.baseline.shared_predict_utils.LogisticModel[source]

Bases: object

Class to represent a logistic mapping.

Fits a logistic mapping from input values x to output values y.

fit(x, y)[source]

Fit a mapping from x values to y values.

params: np.ndarray | None = None
predict(x)[source]

Predict y values given x.

Raises:

TypeError – If the predict() method is called before fit().

recipes.cad_icassp_2026.baseline.shared_predict_utils.estimate_vocals(signal: ndarray, sample_rate: int, model: Module, device: str = 'cpu') ndarray[source]

Estimate vocals from the input signal using the pre-trained model.

Parameters:
  • signal (torch.Tensor | np.ndarray) – Input audio signal.

  • sample_rate (int) – Sample rate of the input signal.

  • model (torch.nn.Module) – Pre-trained source separation model.

  • device (str) – Device to run the model on (‘cpu’ or ‘cuda’).

Returns:

Estimated vocals.

Return type:

np.ndarray

recipes.cad_icassp_2026.baseline.shared_predict_utils.input_align(reference: ndarray, processed: ndarray, fsamp: float = 10000) tuple[ndarray, ndarray][source]

Align the processed signal to the reference signal. Code based on the evaluator.haspi.eb but for variable sampling rate

recipes.cad_icassp_2026.baseline.shared_predict_utils.load_dataset_with_score(cfg, split: str) pandas.DataFrame[source]

Load dataset and add prediction scores.

Parameters:
  • cfg (DictConfig) – Configuration object.

  • split (str) – Dataset split to load (‘train’ or ‘valid’)

Returns:

DataFrame containing dataset records with added scores.

Return type:

pd.DataFrame

recipes.cad_icassp_2026.baseline.shared_predict_utils.load_mixture(dataroot: Path, record: dict, cfg: DictConfig) tuple[ndarray, float][source]

Load the mixture signal for a given record.

Parameters:
  • dataroot (Path) – Root path to the dataset.

  • record (dict) – Record containing signal metadata.

  • cfg (DictConfig) – Configuration object.

Returns:

Mixture signal and its sample rate.

Return type:

tuple[np.ndarray, int]

recipes.cad_icassp_2026.baseline.shared_predict_utils.load_vocals(dataroot: Path, record: dict, cfg: DictConfig, separation_model, device='cpu') ndarray[source]

Load or compute estimated vocals for a given record.

Parameters:
  • dataroot (Path) – Root path to the dataset.

  • record (dict) – Record containing signal metadata.

  • cfg (DictConfig) – Configuration object.

  • separation_model – Pre-trained source separation model.

  • device (str) – Device to run the model on (‘cpu’ or ‘cuda’).

Returns:

Estimated vocals signal.

Return type:

np.ndarray

recipes.cad_icassp_2026.baseline.transcription_scorer module

Module contents