recipes.cad2.task1.baseline package

Submodules

recipes.cad2.task1.baseline.enhance module

Baseline enhancement for CAD2 task1.

recipes.cad2.task1.baseline.enhance.downmix_signal(vocals: ndarray, accompaniment: ndarray, beta: float) ndarray[source]

Downmix the vocals and accompaniment to stereo. :param vocals: Vocal signal. :type vocals: np.ndarray :param accompaniment: Accompaniment signal. :type accompaniment: np.ndarray :param beta: Downmix parameter. :type beta: float

Returns:

Downmixed signal.

Return type:

np.ndarray

Notes

When beta is 0, the downmix is the accompaniment. When beta is 1, the downmix is the vocals.

recipes.cad2.task1.baseline.enhance.enhance(config: DictConfig) None[source]

Run the music enhancement. The system decomposes the music into vocals and accompaniment. Then, vocals are enhanced according to alpha values. Finally, the music is amplified according hearing loss and downmix to stereo.

Parameters:

config (dict) – Dictionary of configuration options for enhancing music.

recipes.cad2.task1.baseline.enhance.get_device(device: str) tuple[source]

Get the Torch device.

Parameters:

device (str) – device type, e.g. “cpu”, “gpu0”, “gpu1”, etc.

Returns:

torch.device() appropiate to the hardware available. str: device type selected, e.g. “cpu”, “cuda”.

Return type:

torch.device

recipes.cad2.task1.baseline.enhance.load_separation_model(causality: str, device: device) ConvTasNetStereo[source]

Load the separation model. :param causality: Causality of the model (causal or noncausal). :type causality: str :param device: Device to load the model. :type device: torch.device

Returns:

Separation model.

Return type:

model

recipes.cad2.task1.baseline.enhance.separate_sources(model: Module, mix: Tensor | ndarray, sample_rate: int, segment: float = 10.0, overlap: float = 0.1, number_sources: int = 4, device: device | str | None = None)[source]

Apply model to a given mixture. Use fade, and add segments together in order to add model segment by segment.

Parameters:
  • model (torch.nn.Module) – model to use for separation

  • mix (torch.Tensor) – mixture to separate, shape (batch, channels, time)

  • sample_rate (int) – sampling rate of the mixture

  • segment (float) – segment length in seconds

  • overlap (float) – overlap between segments, between 0 and 1

  • number_sources (int) – number of sources to separate

  • device (torch.device, str, or None) – if provided, device on which to execute the computation, otherwise mix.device is assumed. When device is different from mix.device, only local computations will be on device, while the entire tracks will be stored on mix.device.

Returns:

estimated sources

Return type:

torch.Tensor

Based on https://pytorch.org/audio/main/tutorials/hybrid_demucs_tutorial.html

recipes.cad2.task1.baseline.evaluate module

Evaluate the enhanced signals using HAAQI and Whisper

recipes.cad2.task1.baseline.evaluate.compute_intelligibility(enhanced_signal: ndarray, segment_metadata: dict, scorer: Module, listener: Listener, sample_rate: int, save_intermediate: bool = False, path_intermediate: str | Path | None = None, equiv_0db_spl: float = 100) tuple[float, float, dict][source]

Compute the Intelligibility score for the enhanced signal using the Whisper model.

To the enhanced signal, we apply the MSGB hearing loss model before transcribing with Whisper.

Parameters:
  • enhanced_signal – The enhanced signal

  • segment_metadata – The metadata of the segment

  • scorer – The Whisper model

  • listener – The listener

  • sample_rate – The sample rate of the signal

  • save_intermediate – Save the intermediate signal

  • path_intermediate – The path to save the intermediate signal

  • equiv_0db_spl – The equivalent 0 dB SPL

Returns:

The intelligibility score for the left and right channels

recipes.cad2.task1.baseline.evaluate.compute_quality(reference_signal: ndarray, enhanced_signal: ndarray, listener: Listener, config: DictConfig) tuple[float, float][source]

Compute the HAAQI score for the left and right channels

recipes.cad2.task1.baseline.evaluate.load_reference_signal(path: str | Path, start_sample: int | None, end_sample: int | None, level_luft: float = -40.0) ndarray[source]

Load the reference signal

recipes.cad2.task1.baseline.evaluate.make_scene_listener_list(scenes_listeners: dict, small_test: bool = False) list[source]

Make the list of scene-listener pairing to process

Parameters:
  • scenes_listeners (dict) – Dictionary of scenes and listeners.

  • small_test (bool) – Whether to use a small test set.

Returns:

List of scene-listener pairings.

Return type:

list

recipes.cad2.task1.baseline.evaluate.normalise_luft(signal: ndarray, sample_rate: float, target_luft: float = -40.0) ndarray[source]

Normalise the signal to a target loudness level. :param signal: input signal to normalise :param sample_rate: sample rate of the signal :param target_luft: target loudness level in LUFS.

Returns:

normalised signal

Return type:

np.ndarray

recipes.cad2.task1.baseline.evaluate.run_compute_scores(config: DictConfig) None[source]

Compute the scores for the enhanced signals

Module contents