The Modified Binaural Short-Time Objective Intelligibility (mbstoi) measure.
Parameters:
left_ear_clean (ndarray) – Clean speech signal from left ear.
right_ear_clean (ndarray) – Clean speech signal from right ear.
left_ear_noisy (ndarray) – Noisy/processed speech signal from left ear.
right_ear_noisy (ndarray) – Noisy/processed speech signal from right ear.
fs_signal (int) – Frequency sample rate of signal.
gridcoarseness (int) – Grid coarseness as denominator of ntaus and ngammas.
Defaults to 1.
sample_rate (int) – Sample Rate.
n_frame (int) – Number of Frames.
fft_size_in_samples (int) – ??? size in samples.
n_third_octave_bands (int) – Number of third octave bands.
centre_freq_first_third_octave_hz (int) – 150,
n_frames (int) – Number of Frames.
dyn_range (int) – Dynamic Range.
tau_min (float) – Min Tau the ???
tau_max (float) – Max Tau the ???
gamma_min (int) – Minimum gamma the ???
gamma_max (int) – Maximum gamma the ???
sigma_delta_0 (float) –
???
sigma_epsilon_0 (float) –
???
alpha_0_db (int) –
???
tau_0 (float) –
???
level_shift_deviation (float) –
???
Returns:
mbstoi index d.
Return type:
float
Notes
All title, copyrights and pending patents pertaining to mbtsoi[1]_ in and to the
original Matlab software are owned by oticon a/s and/or Aalborg University.
Please see http://ah-andersen.net/code/<http://ah-andersen.net/code/>
Run the equalisation-cancellation (EC) stage of the MBSTOI metric.
The EC loop evaluates one huge equation in every iteration (see referenced notes
for details). The left and right ear signals are level adjusted by gamma (in dB) and
time shifted by tau relative to one-another and are thereafter subtracted. The
processed signals are treated similarly. To obtain performance similar to that of
humans,the EC stage adds jitter. We are searching for the level and time adjustments
that maximise the intermediate correlation coefficients d. Could add location of
source and interferer to this to reduce search space.
Parameters:
left_ear_clean_hat (np.ndarray) – Clean left ear short-time DFT coefficients
(single-sided) per frequency bin and frame.
right_ear_clean_hat (np.ndarray) – Clean right ear short-time DFT coefficients
(single-sided) per frequency bin and frame.
left_ear_noisy_hat (np.ndarray) – Noisy/processed left ear short-time DFT
coefficients (single-sided) per frequency bin and frame.
right_ear_noisy_hat (np.ndarray) – Noisy/processed right eat short-time DFT
coefficients (single-sided) per frequency bin and frame.
n_third_octave_bands (int) – Number of one-third octave bands.
n_frames (int) – Number of frames for intermediate intelligibility measure.
fids (np.ndarray) – Indices of frequency band edges.
A frame is excluded if its energy is lower than max(energy) - dyn_range
The frame exclusion is based solely on x, the clean speech signal
Based on mpariente/pystoi/utils.py
Parameters:
left_ear_clean (np.ndarray) – Clean input signal left channel.
right_ear_clean (np.ndarray) – Clean input signal right channel.
left_ear_noisy (np.ndarray) – Degraded/processed signal left channel.
right_ear_noisy (np.ndarray) – Degraded/processed signal right channel.
dyn_range (np.ndarray) – Range, energy range to determine which frame is silent
Default is 40.
framelen (int) – Window size for energy evaluation (default : 256).
hop (int) – Hop size for energy evaluation (default : 128).
Returns :
xl_sil (np.ndarray): left_ear_clean without the silent frames.
xr_sil (np.ndarray): right_ear_clean without the silent frames.
yl_sil (np.ndarray): left_ear_noisy without the silent frames in xl_sil.
yr_sil (np.ndarray): right_ear_noisy without the silent frames in rl_sil.
Returns the 1/3 octave band matrix and its center frequencies
based on mpariente/pystoi.
Parameters:
sample_rate (float) – Frequency sampling rate.
n_fft (int) – Number of FFT. FFT == ???
num_bands (int) – Number of one-third octave bands.
min_freq (int) – Center frequencey of the lowest one-third octave band.
Returns:
centre_frequencies (np.ndarray) : Centre frequencies.
frequency_band_edges_indices (np.ndarray) : Indices of Frequency Band Edges
freq_low (float) : Lowest frequency.
freq_high (float) : Highest frequency
The Modified Binaural Short-Time Objective Intelligibility (mbstoi) measure.
Parameters:
left_ear_clean (ndarray) – Clean speech signal from left ear.
right_ear_clean (ndarray) – Clean speech signal from right ear.
left_ear_noisy (ndarray) – Noisy/processed speech signal from left ear.
right_ear_noisy (ndarray) – Noisy/processed speech signal from right ear.
fs_signal (int) – Frequency sample rate of signal.
gridcoarseness (int) – Grid coarseness as denominator of ntaus and ngammas.
Defaults to 1.
sample_rate (int) – Sample Rate.
n_frame (int) – Number of Frames.
fft_size_in_samples (int) – ??? size in samples.
n_third_octave_bands (int) – Number of third octave bands.
centre_freq_first_third_octave_hz (int) – 150,
n_frames (int) – Number of Frames.
dyn_range (int) – Dynamic Range.
tau_min (float) – Min Tau the ???
tau_max (float) – Max Tau the ???
gamma_min (int) – Minimum gamma the ???
gamma_max (int) – Maximum gamma the ???
sigma_delta_0 (float) –
???
sigma_epsilon_0 (float) –
???
alpha_0_db (int) –
???
tau_0 (float) –
???
level_shift_deviation (float) –
???
Returns:
mbstoi index d.
Return type:
float
Notes
All title, copyrights and pending patents pertaining to mbtsoi[1]_ in and to the
original Matlab software are owned by oticon a/s and/or Aalborg University.
Please see http://ah-andersen.net/code/<http://ah-andersen.net/code/>