clarity.evaluator.mbstoi.mbstoi_utils module

Utilities for MBSTOI processing.

clarity.evaluator.mbstoi.mbstoi_utils.equalisation_cancellation(left_ear_clean_hat: ndarray, right_ear_clean_hat: ndarray, left_ear_noisy_hat: ndarray, right_ear_noisy_hat: ndarray, n_third_octave_bands: int, n_frames: int, frequency_band_edges_indices: ndarray, centre_frequencies: ndarray, taus: ndarray, n_taus: int, gammas: ndarray, n_gammas: int, intermediate_intelligibility_measure_grid: ndarray, p_ec_max: ndarray, sigma_epsilon: ndarray, sigma_delta: ndarray) tuple[ndarray, ndarray][source]

Run the equalisation-cancellation (EC) stage of the MBSTOI metric.

The EC loop evaluates one huge equation in every iteration (see referenced notes for details). The left and right ear signals are level adjusted by gamma (in dB) and time shifted by tau relative to one-another and are thereafter subtracted. The processed signals are treated similarly. To obtain performance similar to that of humans,the EC stage adds jitter. We are searching for the level and time adjustments that maximise the intermediate correlation coefficients d. Could add location of source and interferer to this to reduce search space.

Parameters:
  • left_ear_clean_hat (np.ndarray) – Clean left ear short-time DFT coefficients (single-sided) per frequency bin and frame.

  • right_ear_clean_hat (np.ndarray) – Clean right ear short-time DFT coefficients (single-sided) per frequency bin and frame.

  • left_ear_noisy_hat (np.ndarray) – Noisy/processed left ear short-time DFT coefficients (single-sided) per frequency bin and frame.

  • right_ear_noisy_hat (np.ndarray) – Noisy/processed right eat short-time DFT coefficients (single-sided) per frequency bin and frame.

  • n_third_octave_bands (int) – Number of one-third octave bands.

  • n_frames (int) – Number of frames for intermediate intelligibility measure.

  • fids (np.ndarray) – Indices of frequency band edges.

  • cf (np.ndarray) – Centre frequencies.

  • taus (np.ndarray) – Interaural delay (tau) values.

  • n_taus (int) – Number of tau values.

  • gammas (np.ndarray) – Interaural level difference (gamma) values.

  • ngammas (int) – Number of gamma values.

  • intermediate_intelligibility_measure_grid (np.ndarray) – Grid for intermediate intelligibility measure.

  • p_ec_max (np.ndarray) – Empty grid for maximum values.

  • sigma_epsilon (np.ndarray) – Jitter for gammas.

  • sigma_delta (np.ndarray) – Jitter for taus.

Returns:

updated grid for

intermediate intelligibility measure

p_ec_max (np.ndarray) : grid containing maximum values.

Return type:

intermediate_intelligibility_measure_grid (np.ndarray)

clarity.evaluator.mbstoi.mbstoi_utils.find_delay_impulse(ddf: ndarray, initial_value: int = 22050) ndarray[source]

Find binaural delay in signal ddf.

Finds delay given initial location of unit impulse, initial_value.

Parameters:
  • ddf (np.ndarray)

  • initial_value – (int) : Initial value (default: 22050)

Returns:

Binaural delay.

Return type:

delay (np.ndarray)

clarity.evaluator.mbstoi.mbstoi_utils.remove_silent_frames(left_ear_clean: np.ndarray, right_ear_clean: np.ndarray, left_ear_noisy: np.ndarray, right_ear_noisy: np.ndarray, dynamic_range: int = 40, frame_length: int = 256, hop: int | float = 128) tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray][source]

Remove silent frames of x and y based on x

A frame is excluded if its energy is lower than max(energy) - dyn_range The frame exclusion is based solely on x, the clean speech signal Based on mpariente/pystoi/utils.py

Parameters:
  • left_ear_clean (np.ndarray) – Clean input signal left channel.

  • right_ear_clean (np.ndarray) – Clean input signal right channel.

  • left_ear_noisy (np.ndarray) – Degraded/processed signal left channel.

  • right_ear_noisy (np.ndarray) – Degraded/processed signal right channel.

  • dyn_range (np.ndarray) – Range, energy range to determine which frame is silent Default is 40.

  • framelen (int) – Window size for energy evaluation (default : 256).

  • hop (int) – Hop size for energy evaluation (default : 128).

Returns :

xl_sil (np.ndarray): left_ear_clean without the silent frames. xr_sil (np.ndarray): right_ear_clean without the silent frames. yl_sil (np.ndarray): left_ear_noisy without the silent frames in xl_sil. yr_sil (np.ndarray): right_ear_noisy without the silent frames in rl_sil.

clarity.evaluator.mbstoi.mbstoi_utils.stft(signal: ndarray, win_size: int, fft_size: int) ndarray[source]

Short-time Fourier transform based on MBSTOI Matlab code.

Parameters:
  • signal (np.ndarray) – Input signal

  • win_size (int) – The size of the window and the signal frames.

  • fft_size (int) – The size of the fft in samples (zero-padding or not).

Returns:

The short-time Fourier transform of signal.

Return type:

stft_out (np.ndarray)

clarity.evaluator.mbstoi.mbstoi_utils.thirdoct(sample_rate: float, nfft: int, num_bands: int, min_freq: int) tuple[ndarray, ndarray, ndarray, ndarray, ndarray][source]

Returns the 1/3 octave band matrix and its center frequencies based on mpariente/pystoi.

Parameters:
  • sample_rate (float) – Frequency sampling rate.

  • n_fft (int) – Number of FFT. FFT == ???

  • num_bands (int) – Number of one-third octave bands.

  • min_freq (int) – Center frequencey of the lowest one-third octave band.

Returns:

centre_frequencies (np.ndarray) : Centre frequencies. frequency_band_edges_indices (np.ndarray) : Indices of Frequency Band Edges freq_low (float) : Lowest frequency. freq_high (float) : Highest frequency

Return type:

octave_band_matrix (np.ndarray)