recipes.cad1.task2.baseline package¶
Submodules¶
recipes.cad1.task2.baseline.audio_manager module¶
A utility class for managing audio files.
- class recipes.cad1.task2.baseline.audio_manager.AudioManager(sample_rate: int = 44100, output_audio_path: str | Path = '', soft_clip: bool = False)[source]¶
Bases:
object
A utility class for managing audio files.
- add_audios_to_save(file_name: str, waveform: ndarray) None [source]¶
Add a waveform to the list of audios to save.
- Parameters:
file_name (str) – The name of the track.
waveform (np.ndarray) – The track to save.
- clip_audio(signal: ndarray, min_val: float = -1, max_val: float = 1) tuple[int, ndarray] [source]¶
Clip a WAV file to the given range.
- Parameters:
signal (np.ndarray) – The WAV file to clip.
min_val (float) – The minimum value to clip to. Defaults to -1.
max_val (float) – The maximum value to clip to. Defaults to 1.
- Returns:
Number of samples clipped and the clipped signal.
- Return type:
Tuple[int, np.ndarray]
- get_lufs_level(signal: ndarray) float [source]¶
Get the LUFS level of the signal.
- Parameters:
signal (np.ndarray) – The signal to get the LUFS level of.
- Returns:
The LUFS level of the signal.
- Return type:
float
recipes.cad1.task2.baseline.baseline_utils module¶
Utility functions for the baseline model.
- recipes.cad1.task2.baseline.baseline_utils.load_hrtf(config: DictConfig) dict [source]¶
Load the HRTF file.
- Parameters:
config (DictConfig) – A dictionary-like object containing various configuration parameters for the evaluation. This includes the path to the HRTF files.
- Returns:
A dictionary containing the HRTF data for the dataset.
- Return type:
hrtf_data (dict)
- recipes.cad1.task2.baseline.baseline_utils.load_listeners_and_scenes(config: DictConfig) tuple[dict, dict[str, Listener], dict] [source]¶
Load listener and scene data
- Parameters:
config (DictConfig) – A dictionary-like object containing various configuration parameters for the evaluation. This includes the path to the scenes file, the path to the listeners train file, and the path to the listeners valid file.
- Returns:
- A tuple containing the scene data, the listener data
and the pair scenes-listeners.
- Return type:
Tuple[dict, dict, dict]
- recipes.cad1.task2.baseline.baseline_utils.make_scene_listener_list(scenes_listeners, small_test=False)[source]¶
Make the list of scene-listener pairing to process
- recipes.cad1.task2.baseline.baseline_utils.read_mp3(file_path: str | Path, sample_rate: float | None = None) tuple[np.ndarray, float] [source]¶
Read a MP3 file and return its signal.
- Parameters:
file_path (str, Path) – The path to the mp3 file.
sample_rate (int) – The sampling frequency of the mp3 file.
- Returns:
The signal of the mp3 file. sample_rate (int): The sampling frequency of the mp3 file.
- Return type:
signal (np.ndarray)
recipes.cad1.task2.baseline.car_scene_acoustics module¶
A class for the car acoustic environment.
- class recipes.cad1.task2.baseline.car_scene_acoustics.CarSceneAcoustics(track_duration: int, sample_rate: int, hrtf_dir: str, config_nalr: dict, config_compressor: dict, extend_noise: float = 0.2)[source]¶
Bases:
object
A class for the car acoustic environment.
- Constants:
- ANECHOIC_HRTF_FOR_NOISE (dict): A dictionary containing the names of the
- anechoic BRIRs for the following directions:
- 0 degrees: front
000_left: The left channel of the BRIR for 0 degrees.
000_right: The right channel of the BRIR for 0 degrees.
- -90 degrees: left
m90_left: The left channel of the BRIR for -90 degrees.
m90_right: The right channel of the BRIR for -90 degrees.
- 90 degrees: right
p90_left: The left channel of the BRIR for 90 degrees.
p90_right: The right channel of the BRIR for 90 degrees.
- ANECHOIC_HRTF_FOR_NOISE = {'000_left': 'HR36_E02_CH1_Left.wav', '000_right': 'HR36_E02_CH1_Right.wav', 'm90_left': 'HR0_E02_CH1_Left.wav', 'm90_right': 'HR0_E02_CH1_Right.wav', 'p90_left': 'HR72_E02_CH1_Left.wav', 'p90_right': 'HR72_E02_CH1_Right.wav'}¶
- add_anechoic_hrtf_to_noise(noise_signal: ndarray) ndarray [source]¶
Adds the Anechoic HRTF to the noise signal. :param noise_signal: A numpy array representing the different components
of the car noise signal.
- Returns:
The noise signal with the Anechoic HRTF applied.
- Return type:
np.ndarray
- add_hrtf_to_stereo_signal(signal: ndarray, hrir: dict, hrtf_type: str) ndarray [source]¶
- Add a head rotation transfer function using binaural room impulse
response (BRIR) from eBrird.
- Parameters:
signal (np.ndarray) – a numpy array of shape (2, n_samples) containing the stereo audio signal.
hrir – a dictionary containing the HRIR (head-related impulse response) filenames.
hrtf_type – the type of HRTF to use. Can be either “Anechoic” or “Car”.
- Returns:
- A numpy array of shape (2, n_samples) containing the stereo audio signal
with the BRIR added.
- static add_two_signals(signal1: ndarray, signal2: ndarray) ndarray [source]¶
Adds two signals together.
- Parameters:
signal1 (np.ndarray) – The first signal.
signal2 (np.ndarray) – The second signal.
- Returns:
The sum of the two signals.
- Return type:
np.ndarray
- apply_car_acoustics_to_signal(enh_signal: ndarray, scene: dict, listener: Listener, hrtf: dict, audio_manager: AudioManager, config: DictConfig) ndarray [source]¶
Applies the car acoustics to the enhanced signal.
- Parameters:
enh_signal (np.ndarray) – The enhanced signal to apply the car acoustics to.
scene (dict) – The scene dictionary with the acoustics parameters.
listener (Listener) – The listener characteristics.
hrtf (dict) – A dictionary containing the head-related transfer functions (HRTFs) for the listener being evaluated. This includes the left and right HRTFs for the car and the anechoic room.
audio_manager (AudioManager) – The audio manager object.
config (DictConfig) – The config object.
- Returns:
The enhanced signal with the car acoustics applied. np.ndarray: The reference signal normalised to enhanced level.
- Return type:
np.ndarray
- apply_hearing_aid(signal: ndarray, audiogram: Audiogram) ndarray [source]¶
Applies the hearing aid: It consists in NALR prescription and Compressor
- Parameters:
signal (np.ndarray) – The audio signal to be enhanced.
audiogram (Audiogram) – The audiogram of the listener.
- Returns:
The enhanced audio signal.
- Return type:
np.ndarray
- equalise_level(signal: ndarray, reference_signal: ndarray, max_level: float = 20) ndarray [source]¶
Equalises the level of the target signal to the reference signal.
- Parameters:
signal (np.ndarray) – The target signal to equalise.
reference_signal (np.ndarray) – The reference signal.
max_level (float) – The maximum level of the target signal. This to prevent clipping.
- Returns:
The equalised target signal.
- Return type:
np.ndarray
- get_car_noise(car_noise_params: dict) ndarray [source]¶
Generates car noise.
- Parameters:
car_noise_params (dict) – Car Noise Parameters as generated by Class CarNoiseParameterGenerator
- Returns:
- A numpy array representing the different components
of the car noise signal
- Return type:
numpy.ndarray
- preload_anechoic_hrtf(hrtf_dir: str) None [source]¶
Loads the Anechoic BRIRs from the eBrird database for the given directions. Using the following directions:
0 degrees: front -90 degrees: left 90 degrees: right
- Parameters:
brird_dir (str) – The path to the directory containing the BRIR files.
- scale_signal_to_snr(signal: ndarray, reference_signal: ndarray, snr: float = 0.0) ndarray [source]¶
Scales the target signal to the desired SNR. We transpose channel because pylodnorm operates on arrays with shape [n_samples, n_channels].
- Parameters:
target_signal (np.ndarray) – The target signal to scale.
reference_signal (np.ndarray) – The reference signal.
snr (float) – The desired SNR gain in dB.
- Returns:
The scaled target signal.
- Return type:
np.ndarray
recipes.cad1.task2.baseline.enhance module¶
Run the dummy enhancement.
- recipes.cad1.task2.baseline.enhance.compute_average_hearing_loss(listener: Listener) float [source]¶
Compute the average hearing loss of a listener.
- Parameters:
listener (Listener) – The listener.
- Returns:
The average hearing loss of the listener.
- Return type:
average_hearing_loss (float)
- recipes.cad1.task2.baseline.enhance.enhance(config: DictConfig) None [source]¶
Run the music enhancement. The baseline system is a dummy processor that returns the input signal.
- Parameters:
config (dict) – Dictionary of configuration options for enhancing music.
- recipes.cad1.task2.baseline.enhance.enhance_song(waveform: ndarray, listener: Listener, config: DictConfig) tuple[ndarray, ndarray] [source]¶
Enhance a single song for a listener.
Baseline enhancement returns the signal with a loudness of -14 LUFS if the average hearing loss is below 50 dB HL, and -11 LUFS otherwise.
- Parameters:
waveform (np.ndarray) – The waveform of the song.
listener (Listener) – The listener.
config (dict) – Dictionary of configuration options for enhancing music.
- Returns:
The enhanced left channel. out_right (np.ndarray): The enhanced right channel.
- Return type:
out_left (np.ndarray)
recipes.cad1.task2.baseline.evaluate module¶
Evaluate the enhanced signals using the HAAQI metric.
- recipes.cad1.task2.baseline.evaluate.evaluate_scene(ref_signal: ndarray, enh_signal: ndarray, sample_rate: int, scene_id: str, current_scene: dict, listener: Listener, car_scene_acoustic: CarSceneAcoustics, hrtf: dict, config: DictConfig) tuple[float, float] [source]¶
Evaluate a single scene and return HAAQI scores for left and right ears
- Parameters:
ref_signal (np.ndarray) – A numpy array of shape (2, n_samples) containing the reference signal.
enh_signal (np.ndarray) – A numpy array of shape (2, n_samples) containing the enhanced signal.
sample_rate (int) – The sampling frequency of the reference and enhanced signals.
scene_id (str) – A string identifier for the scene being evaluated.
current_scene (dict) – A dictionary containing information about the scene being evaluated, including the song ID, the listener ID, the car noise type, and the split.
listener (Listener) – the listener to use
car_scene_acoustic (CarSceneAcoustics) – An instance of the CarSceneAcoustics class, which is used to generate car noise and add binaural room impulse responses (BRIRs) to the enhanced signal.
hrtf (dict) – A dictionary containing the head-related transfer functions (HRTFs) for the listener being evaluated. This includes the left and right HRTFs for the car and the anechoic room.
config (DictConfig) – A dictionary-like object containing various configuration parameters for the evaluation. This includes the path to the enhanced signal folder,the path to the music directory, and a flag indicating whether to set a random seed.
- Returns:
A tuple containing HAAQI scores for left and right ears.
- Return type:
Tuple[float, float]
recipes.cad1.task2.baseline.merge_batches_results module¶
Join batches scores into a single file.
recipes.cad1.task2.baseline.test module¶
Run the dummy enhancement.
- recipes.cad1.task2.baseline.test.enhance(config: DictConfig) None [source]¶
Run the music enhancement. The baseline system is a dummy processor that returns the input signal.
- Parameters:
config (dict) – Dictionary of configuration options for enhancing music.
- recipes.cad1.task2.baseline.test.pack_submission(team_id: str, root_dir: str | Path, base_dir: str | Path = '.') None [source]¶
Pack the submission files into an archive file.
- Parameters:
team_id (str) – Team ID.
root_dir (str | Path) – Root directory of the archived file.
base_dir (str | Path) – Base directory to archive. Defaults to “.”.