clarity.evaluator.mbstoi package

Submodules

clarity.evaluator.mbstoi.mbstoi module

Modified Binaural Short-Time Objective Intelligibility (MBSTOI) Measure

clarity.evaluator.mbstoi.mbstoi.mbstoi(left_ear_clean: ndarray, right_ear_clean: ndarray, left_ear_noisy: ndarray, right_ear_noisy: ndarray, sr_signal: float, gridcoarseness: int = 1, sample_rate: float = 10000.0, n_frame: int = 256, fft_size_in_samples: int = 512, n_third_octave_bands: int = 15, centre_freq_first_third_octave_hz: int = 150, n_frames: int = 30, dyn_range: int = 40, tau_min: float = -0.001, tau_max: float = 0.001, gamma_min: int = -20, gamma_max: int = 20, sigma_delta_0: float = 6.5e-05, sigma_epsilon_0: float = 1.5, alpha_0_db: int = 13, tau_0: float = 0.0016, level_shift_deviation: float = 1.6) → float[source]

The Modified Binaural Short-Time Objective Intelligibility (mbstoi) measure.

Parameters:

left_ear_clean (ndarray) – Clean speech signal from left ear.
right_ear_clean (ndarray) – Clean speech signal from right ear.
left_ear_noisy (ndarray) – Noisy/processed speech signal from left ear.
right_ear_noisy (ndarray) – Noisy/processed speech signal from right ear.
fs_signal (int) – Frequency sample rate of signal.
gridcoarseness (int) – Grid coarseness as denominator of ntaus and ngammas. Defaults to 1.
sample_rate (int) – Sample Rate.
n_frame (int) – Number of Frames.
fft_size_in_samples (int) – ??? size in samples.
n_third_octave_bands (int) – Number of third octave bands.
centre_freq_first_third_octave_hz (int) – 150,
n_frames (int) – Number of Frames.
dyn_range (int) – Dynamic Range.
tau_min (float) – Min Tau the ???
tau_max (float) – Max Tau the ???
gamma_min (int) – Minimum gamma the ???
gamma_max (int) – Maximum gamma the ???
sigma_delta_0 (float) –
???
sigma_epsilon_0 (float) –
???
alpha_0_db (int) –
???
tau_0 (float) –
???
level_shift_deviation (float) –
???

Returns:

mbstoi index d.

Return type:

float

Notes

All title, copyrights and pending patents pertaining to mbtsoi[1]_ in and to the original Matlab software are owned by oticon a/s and/or Aalborg University. Please see http://ah-andersen.net/code/<http://ah-andersen.net/code/>

clarity.evaluator.mbstoi.mbstoi_utils module

Utilities for MBSTOI processing.

clarity.evaluator.mbstoi.mbstoi_utils.equalisation_cancellation(left_ear_clean_hat: ndarray, right_ear_clean_hat: ndarray, left_ear_noisy_hat: ndarray, right_ear_noisy_hat: ndarray, n_third_octave_bands: int, n_frames: int, frequency_band_edges_indices: ndarray, centre_frequencies: ndarray, taus: ndarray, n_taus: int, gammas: ndarray, n_gammas: int, intermediate_intelligibility_measure_grid: ndarray, p_ec_max: ndarray, sigma_epsilon: ndarray, sigma_delta: ndarray) → tuple[ndarray, ndarray][source]

Run the equalisation-cancellation (EC) stage of the MBSTOI metric.

The EC loop evaluates one huge equation in every iteration (see referenced notes for details). The left and right ear signals are level adjusted by gamma (in dB) and time shifted by tau relative to one-another and are thereafter subtracted. The processed signals are treated similarly. To obtain performance similar to that of humans,the EC stage adds jitter. We are searching for the level and time adjustments that maximise the intermediate correlation coefficients d. Could add location of source and interferer to this to reduce search space.

Parameters:

left_ear_clean_hat (np.ndarray) – Clean left ear short-time DFT coefficients (single-sided) per frequency bin and frame.
right_ear_clean_hat (np.ndarray) – Clean right ear short-time DFT coefficients (single-sided) per frequency bin and frame.
left_ear_noisy_hat (np.ndarray) – Noisy/processed left ear short-time DFT coefficients (single-sided) per frequency bin and frame.
right_ear_noisy_hat (np.ndarray) – Noisy/processed right eat short-time DFT coefficients (single-sided) per frequency bin and frame.
n_third_octave_bands (int) – Number of one-third octave bands.
n_frames (int) – Number of frames for intermediate intelligibility measure.
fids (np.ndarray) – Indices of frequency band edges.
cf (np.ndarray) – Centre frequencies.
taus (np.ndarray) – Interaural delay (tau) values.
n_taus (int) – Number of tau values.
gammas (np.ndarray) – Interaural level difference (gamma) values.
ngammas (int) – Number of gamma values.
intermediate_intelligibility_measure_grid (np.ndarray) – Grid for intermediate intelligibility measure.
p_ec_max (np.ndarray) – Empty grid for maximum values.
sigma_epsilon (np.ndarray) – Jitter for gammas.
sigma_delta (np.ndarray) – Jitter for taus.

Returns:

updated grid for: intermediate intelligibility measure

p_ec_max (np.ndarray) : grid containing maximum values.

Return type:

intermediate_intelligibility_measure_grid (np.ndarray)

clarity.evaluator.mbstoi.mbstoi_utils.find_delay_impulse(ddf: ndarray, initial_value: int = 22050) → ndarray[source]

Find binaural delay in signal ddf.

Finds delay given initial location of unit impulse, initial_value.

Parameters:

ddf (np.ndarray)
initial_value – (int) : Initial value (default: 22050)

Returns:

Binaural delay.

Return type:

delay (np.ndarray)

clarity.evaluator.mbstoi.mbstoi_utils.remove_silent_frames(left_ear_clean: ndarray, right_ear_clean: ndarray, left_ear_noisy: ndarray, right_ear_noisy: ndarray, dynamic_range: int = 40, frame_length: int = 256, hop: int | float = 128) → tuple[ndarray, ndarray, ndarray, ndarray][source]

Remove silent frames of x and y based on x

A frame is excluded if its energy is lower than max(energy) - dyn_range The frame exclusion is based solely on x, the clean speech signal Based on mpariente/pystoi/utils.py

Parameters:

left_ear_clean (np.ndarray) – Clean input signal left channel.
right_ear_clean (np.ndarray) – Clean input signal right channel.
left_ear_noisy (np.ndarray) – Degraded/processed signal left channel.
right_ear_noisy (np.ndarray) – Degraded/processed signal right channel.
dyn_range (np.ndarray) – Range, energy range to determine which frame is silent Default is 40.
framelen (int) – Window size for energy evaluation (default : 256).
hop (int) – Hop size for energy evaluation (default : 128).

Returns :: xl_sil (np.ndarray): left_ear_clean without the silent frames. xr_sil (np.ndarray): right_ear_clean without the silent frames. yl_sil (np.ndarray): left_ear_noisy without the silent frames in xl_sil. yr_sil (np.ndarray): right_ear_noisy without the silent frames in rl_sil.

clarity.evaluator.mbstoi.mbstoi_utils.stft(signal: ndarray, win_size: int, fft_size: int) → ndarray[source]

Short-time Fourier transform based on MBSTOI Matlab code.

Parameters:

signal (np.ndarray) – Input signal
win_size (int) – The size of the window and the signal frames.
fft_size (int) – The size of the fft in samples (zero-padding or not).

Returns:

The short-time Fourier transform of signal.

Return type:

stft_out (np.ndarray)

clarity.evaluator.mbstoi.mbstoi_utils.thirdoct(sample_rate: float, nfft: int, num_bands: int, min_freq: int) → tuple[ndarray, ndarray, ndarray, ndarray, ndarray][source]

Returns the 1/3 octave band matrix and its center frequencies based on mpariente/pystoi.

Parameters:

sample_rate (float) – Frequency sampling rate.
n_fft (int) – Number of FFT. FFT == ???
num_bands (int) – Number of one-third octave bands.
min_freq (int) – Center frequencey of the lowest one-third octave band.

Returns:

centre_frequencies (np.ndarray) : Centre frequencies. frequency_band_edges_indices (np.ndarray) : Indices of Frequency Band Edges freq_low (float) : Lowest frequency. freq_high (float) : Highest frequency

Return type:

octave_band_matrix (np.ndarray)

Module contents

Modified Binaural Short-Time Objective Intelligibility Evaluator

clarity.evaluator.mbstoi.mbstoi(left_ear_clean: ndarray, right_ear_clean: ndarray, left_ear_noisy: ndarray, right_ear_noisy: ndarray, sr_signal: float, gridcoarseness: int = 1, sample_rate: float = 10000.0, n_frame: int = 256, fft_size_in_samples: int = 512, n_third_octave_bands: int = 15, centre_freq_first_third_octave_hz: int = 150, n_frames: int = 30, dyn_range: int = 40, tau_min: float = -0.001, tau_max: float = 0.001, gamma_min: int = -20, gamma_max: int = 20, sigma_delta_0: float = 6.5e-05, sigma_epsilon_0: float = 1.5, alpha_0_db: int = 13, tau_0: float = 0.0016, level_shift_deviation: float = 1.6) → float[source]

The Modified Binaural Short-Time Objective Intelligibility (mbstoi) measure.

Parameters:

left_ear_clean (ndarray) – Clean speech signal from left ear.
right_ear_clean (ndarray) – Clean speech signal from right ear.
left_ear_noisy (ndarray) – Noisy/processed speech signal from left ear.
right_ear_noisy (ndarray) – Noisy/processed speech signal from right ear.
fs_signal (int) – Frequency sample rate of signal.
gridcoarseness (int) – Grid coarseness as denominator of ntaus and ngammas. Defaults to 1.
sample_rate (int) – Sample Rate.
n_frame (int) – Number of Frames.
fft_size_in_samples (int) – ??? size in samples.
n_third_octave_bands (int) – Number of third octave bands.
centre_freq_first_third_octave_hz (int) – 150,
n_frames (int) – Number of Frames.
dyn_range (int) – Dynamic Range.
tau_min (float) – Min Tau the ???
tau_max (float) – Max Tau the ???
gamma_min (int) – Minimum gamma the ???
gamma_max (int) – Maximum gamma the ???
sigma_delta_0 (float) –
???
sigma_epsilon_0 (float) –
???
alpha_0_db (int) –
???
tau_0 (float) –
???
level_shift_deviation (float) –
???

Returns:

mbstoi index d.

Return type:

float

Notes

All title, copyrights and pending patents pertaining to mbtsoi[1]_ in and to the original Matlab software are owned by oticon a/s and/or Aalborg University. Please see http://ah-andersen.net/code/<http://ah-andersen.net/code/>