recipes.cad1.task2.baseline package

Submodules

recipes.cad1.task2.baseline.audio_manager module

A utility class for managing audio files.

class recipes.cad1.task2.baseline.audio_manager.AudioManager(sample_rate: int = 44100, output_audio_path: str | Path = '', soft_clip: bool = False)[source]

Bases: object

A utility class for managing audio files.

add_audios_to_save(file_name: str, waveform: ndarray) None[source]

Add a waveform to the list of audios to save.

Parameters:
  • file_name (str) – The name of the track.

  • waveform (np.ndarray) – The track to save.

clip_audio(signal: ndarray, min_val: float = -1, max_val: float = 1) tuple[int, ndarray][source]

Clip a WAV file to the given range.

Parameters:
  • signal (np.ndarray) – The WAV file to clip.

  • min_val (float) – The minimum value to clip to. Defaults to -1.

  • max_val (float) – The maximum value to clip to. Defaults to 1.

Returns:

Number of samples clipped and the clipped signal.

Return type:

Tuple[int, np.ndarray]

get_lufs_level(signal: ndarray) float[source]

Get the LUFS level of the signal.

Parameters:

signal (np.ndarray) – The signal to get the LUFS level of.

Returns:

The LUFS level of the signal.

Return type:

float

save_audios() None[source]

Save the audios to the given path.

Parameters:

output_audio_path (str) – The path to save the audios to.

scale_to_lufs(signal: ndarray, target_lufs: float) ndarray[source]

Scale the signal to the given LUFS level.

Parameters:
  • signal (np.ndarray) – The signal to scale.

  • target_lufs (float) – The target LUFS level.

Returns:

The scaled signal.

Return type:

np.ndarray

recipes.cad1.task2.baseline.baseline_utils module

Utility functions for the baseline model.

recipes.cad1.task2.baseline.baseline_utils.load_hrtf(config: DictConfig) dict[source]

Load the HRTF file.

Parameters:

config (DictConfig) – A dictionary-like object containing various configuration parameters for the evaluation. This includes the path to the HRTF files.

Returns:

A dictionary containing the HRTF data for the dataset.

Return type:

hrtf_data (dict)

recipes.cad1.task2.baseline.baseline_utils.load_listeners_and_scenes(config: DictConfig) tuple[dict, dict[str, Listener], dict][source]

Load listener and scene data

Parameters:

config (DictConfig) – A dictionary-like object containing various configuration parameters for the evaluation. This includes the path to the scenes file, the path to the listeners train file, and the path to the listeners valid file.

Returns:

A tuple containing the scene data, the listener data

and the pair scenes-listeners.

Return type:

Tuple[dict, dict, dict]

recipes.cad1.task2.baseline.baseline_utils.make_scene_listener_list(scenes_listeners, small_test=False)[source]

Make the list of scene-listener pairing to process

recipes.cad1.task2.baseline.baseline_utils.read_mp3(file_path: str | Path, sample_rate: float | None = None) tuple[np.ndarray, float][source]

Read a MP3 file and return its signal.

Parameters:
  • file_path (str, Path) – The path to the mp3 file.

  • sample_rate (int) – The sampling frequency of the mp3 file.

Returns:

The signal of the mp3 file. sample_rate (int): The sampling frequency of the mp3 file.

Return type:

signal (np.ndarray)

recipes.cad1.task2.baseline.car_scene_acoustics module

A class for the car acoustic environment.

class recipes.cad1.task2.baseline.car_scene_acoustics.CarSceneAcoustics(track_duration: int, sample_rate: int, hrtf_dir: str, config_nalr: dict, config_compressor: dict, extend_noise: float = 0.2)[source]

Bases: object

A class for the car acoustic environment.

Constants:
ANECHOIC_HRTF_FOR_NOISE (dict): A dictionary containing the names of the
anechoic BRIRs for the following directions:
0 degrees: front
  • 000_left: The left channel of the BRIR for 0 degrees.

  • 000_right: The right channel of the BRIR for 0 degrees.

-90 degrees: left
  • m90_left: The left channel of the BRIR for -90 degrees.

  • m90_right: The right channel of the BRIR for -90 degrees.

90 degrees: right
  • p90_left: The left channel of the BRIR for 90 degrees.

  • p90_right: The right channel of the BRIR for 90 degrees.

ANECHOIC_HRTF_FOR_NOISE = {'000_left': 'HR36_E02_CH1_Left.wav', '000_right': 'HR36_E02_CH1_Right.wav', 'm90_left': 'HR0_E02_CH1_Left.wav', 'm90_right': 'HR0_E02_CH1_Right.wav', 'p90_left': 'HR72_E02_CH1_Left.wav', 'p90_right': 'HR72_E02_CH1_Right.wav'}
add_anechoic_hrtf_to_noise(noise_signal: ndarray) ndarray[source]

Adds the Anechoic HRTF to the noise signal. :param noise_signal: A numpy array representing the different components

of the car noise signal.

Returns:

The noise signal with the Anechoic HRTF applied.

Return type:

np.ndarray

add_hrtf_to_stereo_signal(signal: ndarray, hrir: dict, hrtf_type: str) ndarray[source]
Add a head rotation transfer function using binaural room impulse

response (BRIR) from eBrird.

Parameters:
  • signal (np.ndarray) – a numpy array of shape (2, n_samples) containing the stereo audio signal.

  • hrir – a dictionary containing the HRIR (head-related impulse response) filenames.

  • hrtf_type – the type of HRTF to use. Can be either “Anechoic” or “Car”.

Returns:

A numpy array of shape (2, n_samples) containing the stereo audio signal

with the BRIR added.

static add_two_signals(signal1: ndarray, signal2: ndarray) ndarray[source]

Adds two signals together.

Parameters:
  • signal1 (np.ndarray) – The first signal.

  • signal2 (np.ndarray) – The second signal.

Returns:

The sum of the two signals.

Return type:

np.ndarray

apply_car_acoustics_to_signal(enh_signal: ndarray, scene: dict, listener: Listener, hrtf: dict, audio_manager: AudioManager, config: DictConfig) ndarray[source]

Applies the car acoustics to the enhanced signal.

Parameters:
  • enh_signal (np.ndarray) – The enhanced signal to apply the car acoustics to.

  • scene (dict) – The scene dictionary with the acoustics parameters.

  • listener (Listener) – The listener characteristics.

  • hrtf (dict) – A dictionary containing the head-related transfer functions (HRTFs) for the listener being evaluated. This includes the left and right HRTFs for the car and the anechoic room.

  • audio_manager (AudioManager) – The audio manager object.

  • config (DictConfig) – The config object.

Returns:

The enhanced signal with the car acoustics applied. np.ndarray: The reference signal normalised to enhanced level.

Return type:

np.ndarray

apply_hearing_aid(signal: ndarray, audiogram: Audiogram) ndarray[source]

Applies the hearing aid: It consists in NALR prescription and Compressor

Parameters:
  • signal (np.ndarray) – The audio signal to be enhanced.

  • audiogram (Audiogram) – The audiogram of the listener.

Returns:

The enhanced audio signal.

Return type:

np.ndarray

equalise_level(signal: ndarray, reference_signal: ndarray, max_level: float = 20) ndarray[source]

Equalises the level of the target signal to the reference signal.

Parameters:
  • signal (np.ndarray) – The target signal to equalise.

  • reference_signal (np.ndarray) – The reference signal.

  • max_level (float) – The maximum level of the target signal. This to prevent clipping.

Returns:

The equalised target signal.

Return type:

np.ndarray

get_car_noise(car_noise_params: dict) ndarray[source]

Generates car noise.

Parameters:

car_noise_params (dict) – Car Noise Parameters as generated by Class CarNoiseParameterGenerator

Returns:

A numpy array representing the different components

of the car noise signal

Return type:

numpy.ndarray

preload_anechoic_hrtf(hrtf_dir: str) None[source]

Loads the Anechoic BRIRs from the eBrird database for the given directions. Using the following directions:

0 degrees: front -90 degrees: left 90 degrees: right

Parameters:

brird_dir (str) – The path to the directory containing the BRIR files.

scale_signal_to_snr(signal: ndarray, reference_signal: ndarray, snr: float = 0.0) ndarray[source]

Scales the target signal to the desired SNR. We transpose channel because pylodnorm operates on arrays with shape [n_samples, n_channels].

Parameters:
  • target_signal (np.ndarray) – The target signal to scale.

  • reference_signal (np.ndarray) – The reference signal.

  • snr (float) – The desired SNR gain in dB.

Returns:

The scaled target signal.

Return type:

np.ndarray

recipes.cad1.task2.baseline.enhance module

Run the dummy enhancement.

recipes.cad1.task2.baseline.enhance.compute_average_hearing_loss(listener: Listener) float[source]

Compute the average hearing loss of a listener.

Parameters:

listener (Listener) – The listener.

Returns:

The average hearing loss of the listener.

Return type:

average_hearing_loss (float)

recipes.cad1.task2.baseline.enhance.enhance(config: DictConfig) None[source]

Run the music enhancement. The baseline system is a dummy processor that returns the input signal.

Parameters:

config (dict) – Dictionary of configuration options for enhancing music.

recipes.cad1.task2.baseline.enhance.enhance_song(waveform: ndarray, listener: Listener, config: DictConfig) tuple[ndarray, ndarray][source]

Enhance a single song for a listener.

Baseline enhancement returns the signal with a loudness of -14 LUFS if the average hearing loss is below 50 dB HL, and -11 LUFS otherwise.

Parameters:
  • waveform (np.ndarray) – The waveform of the song.

  • listener (Listener) – The listener.

  • config (dict) – Dictionary of configuration options for enhancing music.

Returns:

The enhanced left channel. out_right (np.ndarray): The enhanced right channel.

Return type:

out_left (np.ndarray)

recipes.cad1.task2.baseline.evaluate module

Evaluate the enhanced signals using the HAAQI metric.

recipes.cad1.task2.baseline.evaluate.evaluate_scene(ref_signal: ndarray, enh_signal: ndarray, sample_rate: int, scene_id: str, current_scene: dict, listener: Listener, car_scene_acoustic: CarSceneAcoustics, hrtf: dict, config: DictConfig) tuple[float, float][source]

Evaluate a single scene and return HAAQI scores for left and right ears

Parameters:
  • ref_signal (np.ndarray) – A numpy array of shape (2, n_samples) containing the reference signal.

  • enh_signal (np.ndarray) – A numpy array of shape (2, n_samples) containing the enhanced signal.

  • sample_rate (int) – The sampling frequency of the reference and enhanced signals.

  • scene_id (str) – A string identifier for the scene being evaluated.

  • current_scene (dict) – A dictionary containing information about the scene being evaluated, including the song ID, the listener ID, the car noise type, and the split.

  • listener (Listener) – the listener to use

  • car_scene_acoustic (CarSceneAcoustics) – An instance of the CarSceneAcoustics class, which is used to generate car noise and add binaural room impulse responses (BRIRs) to the enhanced signal.

  • hrtf (dict) – A dictionary containing the head-related transfer functions (HRTFs) for the listener being evaluated. This includes the left and right HRTFs for the car and the anechoic room.

  • config (DictConfig) – A dictionary-like object containing various configuration parameters for the evaluation. This includes the path to the enhanced signal folder,the path to the music directory, and a flag indicating whether to set a random seed.

Returns:

A tuple containing HAAQI scores for left and right ears.

Return type:

Tuple[float, float]

recipes.cad1.task2.baseline.evaluate.run_calculate_audio_quality(config: DictConfig) None[source]

Evaluate the enhanced signals using the HAAQI metric.

recipes.cad1.task2.baseline.evaluate.set_scene_seed(scene: str)[source]

Set a seed that is unique for the given song based on the last 8 characters of the ‘md5’ .hexdigest of the scene itself.

recipes.cad1.task2.baseline.merge_batches_results module

Join batches scores into a single file.

recipes.cad1.task2.baseline.merge_batches_results.join_batches(config: DictConfig) None[source]

Join batches scores into a single file.

recipes.cad1.task2.baseline.test module

Run the dummy enhancement.

recipes.cad1.task2.baseline.test.enhance(config: DictConfig) None[source]

Run the music enhancement. The baseline system is a dummy processor that returns the input signal.

Parameters:

config (dict) – Dictionary of configuration options for enhancing music.

recipes.cad1.task2.baseline.test.pack_submission(team_id: str, root_dir: str | Path, base_dir: str | Path = '.') None[source]

Pack the submission files into an archive file.

Parameters:
  • team_id (str) – Team ID.

  • root_dir (str | Path) – Root directory of the archived file.

  • base_dir (str | Path) – Base directory to archive. Defaults to “.”.

Module contents