Compute the average cross-covariance between the reference and processed
signals in each auditory band.
The silent time-frequency tiles are removed from consideration. The
cross-covariance is computed for each segment in each frequency band. The
values are weighted by 1 for inclusion or 0 if the tile is below
threshold. The sum of the covariance values across time and frequency are
then divided by the total number of tiles above threshold. The calculation
is a modification of Tan et al.[1]_ . The cross-covariance is also output
with a frequency weighting that reflects the loss of IHC synchronization at high
frequencies Johnson[2]_.
Parameters:
signal_cross_covariance (np.array) – [nchan,nseg] of cross-covariance values
reference_signal_mean_square (np.array) – [nchan,nseg] of reference signal MS
values
() (threshold_db) – threshold in dB SL to include segment ave over freq in
average
lp_filter (list) – LP filter order
freq_cutoff (list) – Cutoff frequencies in Hz
Returns:
cross-covariance in segments averaged over time and
frequency
ihc_sync_covariance (): cross-covariance array, 6 different weightings for loss
of IHC synchronization at high frequencies:
LP Filter Order Cutoff Freq, kHz
1 1.5
3 2.0
5 2.5, 3.0, 3.5, 4.0
Return type:
average_covariance ()
References:
Updates:
James M. Kates, 28 August 2012.
Adjusted for BM vibration in dB SL, 30 October 2012.
Threshold for including time-freq tile modified, 30 January 2013.
Version for different sync loss, 15 February 2013.
Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.
Compute the increase in auditory filter bandwidth in response to high signal
levels. The RMS of the control signal, a scalar, is used to set the
bandwidth for the entire signal.
Parameters:
() (level1) – envelope output in the control filter band
() – auditory filter bandwidth computed for the loss (or NH)
() – auditory filter bandwidth at maximum OHC damage
() – RMS=1 corresponds to Level1 dB SPL
Returns:
filter bandwidth increased for high signal levels
Return type:
bandwidth ()
Updates:
James M. Kates, 21 June 2011.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Compute the cross-covariance (normalized cross-correlation) between the reference
and processed signals in each auditory band. The signals are divided into segments
having 50% overlap.
Parameters:
() (segment_size) – Basilar Membrane movement, reference signal
() – Basilar Membrane movement, processed signal
() – signal segment size, msec
freq_sample (int) – sampling rate in Hz
Returns:
[nchan,nseg] of cross-covariance values
reference_mean_square (np.array) : [nchan,nseg] of MS input signal energy values
processed_mean_square (np.array) : [nchan,nseg] of MS processed signal energy
values
Return type:
signal_cross_covariance (np.array)
Updates:
James M. Kates, 28 August 2012.
Output amplitude adjustment added, 30 october 2012.
Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.
Compute the Equivalent Rectangular Bandwidth_[1] frequency spacing for the
gammatone filter bank. The equation comes from Malcolm Slaney[2].
Parameters:
nchan (int) – number of filters in the filter bank
low_freq (int) – Low Frequency level.
high_freq (int) – High Frequency level.
() (shift) – optional frequency shift of the filter bank specified as a fractional
shift in distance along the BM. A positive shift is an increase in frequency
(basal shift), and negative is a decrease in frequency (apical shift). The
total length of the BM is normalized to 1. The frequency-to-distance map is
from D.D. Greenwood[3].
ear_q (float)
min_bw (float)
Returns:
References:
.. [1] Moore BCJ, Glasberg BR (1983) Suggested formulae for calculating
auditory-filter bandwidths and excitation patterns. J Acoustical
Soc America 74:750-753. Available at
<https://doi.org/10.1121/1.389861>
Updates:
James M. Kates, 25 January 2007.
Frequency shift added 22 August 2008.
Lower and upper frequencies fixed at 80 and 8000 Hz, 19 June 2012.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Covert the Root Mean Square average output of the gammatone filter bank
into dB SL. The gain is linear below the lower threshold, compressive
with a compression ratio of CR:1 between the lower and upper thresholds,
and reverts to linear above the upper threshold. The compressor
assumes that auditory threshold is 0 dB SPL.
Parameters:
() (level1) – analytic signal envelope (magnitude) returned by the
bank (gammatone filter)
level (RMS average)
() – control signal envelope
() – OHC attenuation at the input to the compressor
() – kneepoint for the low-level linear amplification
() – compression ratio
() – IHC attenuation at the input to the synapse
() – dB reference level: a signal having an RMS value of 1 is
assigned to Level1 dB SPL.
threshold_high (int)
small (float)
Returns:
compressed output in dB above the impaired threshold
Return type:
reference_db ()
Updates:
James M. Kates, 6 August 2007.
Version for two-tone suppression, 29 August 2008.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Function that implements a cochlear model that includes the middle ear,
auditory filter bank, Outer Hair Cell (OHC) dynamic-range compression,
and Inner Hair Cell (IHC) attenuation.
The inputs are the reference and processed signals that are to be
compared. The reference is at the reference intensity (e.g. 65 dB SPL
or with NAL-R amplification) and has no other processing. The processed
signal is the hearing-aid output, and is assumed to have the same or
greater group delay compared to the reference.
The function outputs the envelopes of the signals after OHC compression
and IHC loss attenuation.
Parameters:
reference (np.ndarray) – reference signal: should be adjusted to 65 dB SPL
(itype=0 or 1) or to 65 dB SPL plus NAL-R gain (itype=2)
reference_freq (int) – sampling rate for the reference signal, Hz
processed (np.ndarray) – processed signal (e.g. hearing-aid output) includes
HA gain
processed_freq (int) – sampling rate for the processed signal, Hz
hearing_loss (np.ndarray) – audiogram giving the hearing loss in dB at 6
audiometric frequencies: [250, 500, 1000, 2000, 4000, 6000] Hz
itype (int) –
purpose for the calculation:
0=intelligibility: reference is normal hearing and must not
include NAL-R EQ
1=quality: reference does not include NAL-R EQ
2=quality: reference already has NAL-R EQ applied
level1 – level calibration: signal RMS=1 corresponds to Level1 dB SPL
nchan (int) – auditory frequency bands
m_delay (int) – Compensate for the gammatone group delay.
shift (float) – Basal shift of the basilar membrane length
Returns:
envelope for the reference in each band
reference_basilar_membrane (): BM motion for the reference in each band
processed_db (): envelope for the processed signal in each band
processed_basilar_membrane (): BM motion for the processed signal in each band
reference_sl (): compressed RMS average reference in each band converted
to dB SL
processed_sl (): compressed RMS average output in each band converted to dB SL
freq_sample (): sampling rate in Hz for the model outputs
Return type:
reference_db ()
Updates:
James M. Kates, 27 October 2011.
Basilar Membrane added 30 Dec 2011.
Revised 19 June 2012.
Remove match of reference RMS level to processed 29 August 2012.
IHC adaptation added 1 October 2012.
Basilar Membrane envelope converted to dB SL, 2 Oct 2012.
Filterbank group delay corrected, 14 Dec 2012.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Updated by Gerardo Roa Dabike, September 2022.
Compute the cochlear compression in one auditory filter band. The gain is linear
below the lower threshold, compressive with a compression ratio of CR:1 between the
lower and upper thresholds, and reverts to linear above the upper threshold. The
compressor assumes that auditory threshold is 0 dB SPL.
Parameters:
() (small) – analytic signal envelope (magnitude) returned by the
gammatone filter bank
() – BM motion output by the filter bank
() – analytic control envelope returned by the wide control
path filter bank
() – OHC attenuation at the input to the compressor
() – kneepoint for the low-level linear amplification
() – compression ratio
() – sampling rate in Hz
() – dB reference level: a signal having an RMS value of 1 is
assigned to Level1 dB SPL.
() –
???
threshold_high – kneepoint for the high-level linear amplification
Returns:
compressed version of the signal envelope
compressed_basilar_membrane (): compressed version of the BM motion
Return type:
compressed_signal ()
Updates:
James M. Kates, 19 January 2007.
LP filter added 15 Feb 2007 (Ref: Zhang et al., 2001)
Version to compress the envelope, 20 Feb 2007.
Change in the OHC I/O function, 9 March 2007.
Two-tone suppression added 22 August 2008.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Function to smooth the envelope returned by the cochlear model. The
envelope is divided into segments having a 50% overlap. Each segment is
windowed, summed, and divided by the window sum to produce the average.
A raised cosine window is used. The envelope sub-sampling frequency is
2*(1000/segsize).
Parameters:
envelopes (np.ndarray) – matrix of envelopes in each of the auditory bands
segment_size – averaging segment size in msec
freq_sample (int) – input envelope sampling rate in Hz
Returns:
matrix of subsampled windowed averages in each band
Return type:
smooth
Updates:
James M. Kates, 26 January 2007.
Final half segment added 27 August 2012.
Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.
Align the envelope of the processed signal to that of the reference signal.
Parameters:
() (output) – envelope or BM motion of the reference signal
() – envelope or BM motion of the output signal
freq_sample (int) – Frequency sample rate in Hz
corr_range (int) – range in msec for the correlation
Returns:
shifted output envelope to match the input
Return type:
y ()
Updates:
James M. Kates, 28 October 2011.
Absolute value of the cross-correlation peak removed, 22 June 2012.
Cross-correlation range reduced, 13 August 2013.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Convert the compressed envelope returned by cochlear_envcomp to dB SL.
Parameters:
() (level1) – linear envelope after compression
() – linear Basilar Membrane vibration after compression
() – IHC attenuation at the input to the synapse
() – level in dB SPL corresponding to 1 RMS
small (float) –
???
Returns:
reference envelope in dB SL
_basilar_membrane (): Basilar Membrane vibration with envelope converted to
dB SL
Return type:
_reference ()
Updates:
James M. Kates, 20 Feb 07.
IHC attenuation added 9 March 2007.
Basilar membrane vibration conversion added 2 October 2012.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
4th-order gammatone auditory filter. This implementation is based on the c program
published on-line by Ning Ma, U. Sheffield, UK[1]_ that gives an implementation of
the Martin Cooke filters[2]_: an impulse-invariant transformation of the gammatone
filter. The signal is demodulated down to baseband using a complex exponential,
and then passed through a cascade of four one-pole low-pass filters.
This version filters two signals that have the same sampling rate and the same
gammatone filter center frequencies. The lengths of the two signals should match;
if they don’t, the signals are truncated to the shorter of the two lengths.
Parameters:
() (freq_sample) – first sequence to be filtered
reference_bandwidth – bandwidth for x relative to that of a normal ear
() – second sequence to be filtered
() – bandwidth for x relative to that of a normal ear
() – sampling rate in Hz
center_frequency (int) – filter center frequency in Hz
ear_q – (float): ???
min_bandwidth (float) –
???
Returns:
filter envelope output (modulated down to baseband)
1st signal
reference_basilar_membrane (): Basilar Membrane for the first signal
processed_envelope (): filter envelope output (modulated down to baseband)
2nd signal
processed_basilar_membrane (): Basilar Membrane for the second signal
Return type:
reference_envelope ()
References:
.. [1] Ma N, Green P, Barker J, Coy A (2007) Exploiting correlogram
Updates:
James M. Kates, 8 Jan 2007.
Vectorized version for efficient MATLAB execution, 4 February 2007.
Cosine and sine generation, 29 June 2011.
Output sine and cosine sequences, 19 June 2012.
Cosine/sine loop speed increased, 9 August 2013.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Compensate for the group delay of the gammatone filter bank. The group
delay is computed for each filter at its center frequency. The firing
rate output of the IHC model is then adjusted so that all outputs have
the same group delay.
Parameters:
xenv (np.ndarray) – matrix of signal envelopes or BM motion
() (freq_sample) – gammatone filter bandwidths adjusted for loss
() – center frequencies of the bands
() – sampling rate for the input signal in Hz (e.g. 24,000 Hz)
ear_q (float)
min_bandwidth (float)
Returns:
envelopes or BM motion compensated for the group delay.
Return type:
processed ()
Updates:
James M. Kates, 28 October 2011.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Provide inner hair cell (IHC) adaptation. The adaptation is based on an
equivalent RC circuit model, and the derivatives are mapped into
1st-order backward differences. Rapid and short-term adaptation are
provided. The input is the signal envelope in dB SL, with IHC attenuation
already applied to the envelope. The outputs are the envelope in dB SL
with adaptation providing overshoot of the long-term output level, and
the BM motion is multiplied by a gain vs. time function that reproduces
the adaptation. IHC attenuation and additive noise for the equivalent
auditory threshold are provided by a subsequent call to eb_BMatten.
Parameters:
reference_db (np.ndarray) – signal envelope in one frequency band in dB SL
contains OHC compression and IHC attenuation
() (delta) – basilar membrane vibration with OHC compression
but no IHC attenuation
() – overshoot factor = delta x steady-state
freq_sample (int) – sampling rate in Hz
Returns:
envelope in dB SL with IHC adaptation
output_basilar_membrane (): Basilar Membrane multiplied by the IHC adaptation
gain function
Return type:
output_db ()
Updates:
James M. Kates, 1 October 2012.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Approximate temporal alignment of the reference and processed output
signals. Leading and trailing zeros are then pruned.
The function assumes that the two sequences have the same sampling rate:
call eb_Resamp24kHz for each sequence first, then call this function to
align the signals.
Returns:
reference (np.ndarray): pruned and shifted reference
processed (np.ndarray): pruned and shifted hearing-aid output
Updates:
James M. Kates, 12 July 2011.
Match the length of the processed output to the reference for the
purposes of computing the cross-covariance
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Apportion the hearing loss to the outer hair cells (OHC) and the inner
hair cells (IHC) and to increase the bandwidth of the cochlear filters
in proportion to the OHC fraction of the total loss.
Parameters:
hearing_loss (np.ndarray) – hearing loss at the 6 audiometric frequencies
center_freq (np.ndarray) – array containing the center frequencies of the
gammatone filters arranged from low to high
audiometric_freq (list)
Returns:
attenuation in dB for the OHC gammatone filters
bandwidth (): OHC filter bandwidth expressed in terms of normal
low_knee (): Lower kneepoint for the low-level linear amplification
compression_ratio (): Ranges from 1.4:1 at 150 Hz to 3.5:1 at 8 kHz for normal
hearing. Reduced in proportion to the OHC loss to 1:1.
attenuated_ihc (): attenuation in dB for the input to the IHC synapse
Return type:
attenuated_ohc ()
Updates:
James M. Kates, 25 January 2007.
Version for loss in dB and match of OHC loss to CR, 9 March 2007.
Low-frequency extent changed to 80 Hz, 27 Oct 2011.
Lower kneepoint set to 30 dB, 19 June 2012.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Compute the cross-correlations between the input signal time-frequency
envelope and the distortion time-frequency envelope.
For each time interval, the log spectrum is fitted with a set of
half-cosine basis functions. The spectrum weighted by the basis
functions corresponds to Mel Cepstral Coefficients computed in the
frequency domain. The amplitude-normalized cross-covariance between
the time-varying basis functions for the input and output signals is
then computed.
Parameters:
() (addnoise) – subsampled input signal envelope in dB SL in each critical band
() – subsampled distorted output signal envelope
() – threshold in dB SPL to include segment in calculation
() – additive Gaussian noise to ensure 0 cross-corr at low levels
Returns:
average cepstral correlation 2-6, input vs output
individual_cepstral_correlations : individual cepstral correlations,
input vs output
Return type:
average_cepstral_correlation
Updates:
James M. Kates, 24 October 2006.
Difference signal removed for cochlear model, 31 January 2007.
Absolute value added 13 May 2011.
Changed to loudness criterion for silence threshsold, 28 August 2012.
Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.
Compute the cross-correlations between the input signal
time-frequency envelope and the distortion time-frequency envelope. For
each time interval, the log spectrum is fitted with a set of half-cosine
basis functions. The spectrum weighted by the basis functions corresponds
to mel cepstral coefficients computed in the frequency domain. The
amplitude-normalized cross-covariance between the time-varying basis
functions for the input and output signals is then computed for each of
the 8 modulation frequencies.
Parameters:
() (segment_size) – subsampled input signal envelope in dB SL in each critical band
() – subsampled distorted output signal envelope
() – threshold in dB SPL to include segment in calculation
() – additive Gaussian noise to ensure 0 cross-corr at low levels
() – segment size in ms used for the envelope LP filter (8 msec)
n_cepstral_coef (int) – Number of cepstral coefficients
Returns:
average of the modulation correlations across analysis
frequency bands and modulation frequency bands, basis functions 2 -6
mel_cepstral_low (): average over the four lower mod freq bands, 0 - 20 Hz
mel_cepstral_high (): average over the four higher mod freq bands, 20 - 125 Hz
mel_cepstral_modulation (): vector of cross-correlations by modulation
frequency, averaged over analysis frequency band
Return type:
mel_cepstral_average ()
Updates:
James M. Kates, 24 October 2006.
Difference signal removed for cochlear model, 31 January 2007.
Absolute value added 13 May 2011.
Changed to loudness criterion for silence threshold, 28 August 2012.
Version using envelope modulation filters, 15 July 2014.
Modulation frequency vector output added 27 August 2014.
Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.
Design the middle ear filters and process the input through the
cascade of filters. The middle ear model is a 2-pole HP filter
at 350 Hz in series with a 1-pole LP filter at 5000 Hz. The
result is a rough approximation to the equal-loudness contour
at threshold.
Arguments:
reference (np.ndarray): input signal
freq_sample (float): sampling rate in Hz
Returns:
xout (): filtered output
Updates:
James M. Kates, 18 January 2007.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Resample the input signal at 24 kHz. The input sampling rate is
rounded to the nearest kHz to compute the sampling rate conversion
ratio.
Arguments:
reference_signal (np.ndarray): input signal
reference_freq (int): sampling rate for the input in Hz
freq_sample_hz (int): Frequency sample in Hz
Returns:
reference_signal_24 signal resampled at kHz (default 24Khz)
freq_sample_hz output sampling rate in Hz
Updates
James M. Kates, 20 June 2011.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Compute changes in the long-term spectrum and spectral slope.
The metric is based on the spectral distortion metric of Moore and Tan[1]_
(JAES, Vol 52, pp 900-914). The log envelopes in dB SL are converted to
linear to approximate specific loudness. The outputs are the sum of the
absolute differences, the standard deviation of the differences, and the
maximum absolute difference. The same three outputs are provided for the
normalized spectral difference and for the slope. The output is
calibrated so that a processed signal having 0 amplitude produces a
value of 1 for the spectrum difference.
Abs diff: weight all deviations uniformly
Std diff: weight larger deviations more than smaller deviations
Max diff: only weight the largest deviation
Parameters:
reference_sl (np.ndarray) – reference signal spectrum in dB SL
processed_sl (np.ndarray) – degraded signal spectrum in dB SL
Returns:
[sum abs diff, std dev diff, max diff] spectra
dnorm (np.array) : [sum abs diff, std dev diff, max diff] norm spectra
dslope (np.array) : [sum abs diff, std dev diff, max diff] slope
Return type:
dloud (np.array)
References:
.. [1] Moore BCJ, Tan, CT (2004) Development and Validation of a Method
Compute the cepstral correlation coefficients between the reference signal
and the distorted signal log envelopes. The silence portions of the
signals are removed prior to the calculation based on the envelope of the
reference signal. For each time sample, the log spectrum in dB SL is
fitted with a set of half-cosine basis functions. The cepstral coefficients
then form the input to the cepstral correlation
calculation.
Parameters:
() (thresh_nerve) – subsampled reference signal envelope in dB SL in each band
() – subsampled distorted output signal envelope
() – threshold in dB SPL to include sample in calculation
() – additive noise RMS for IHC firing (in dB)
nbasis – number of cepstral basis functions to use
Returns:
refernce_cep cepstral coefficient matrix for the ref signal
(nsamp,nbasis) processed_cep cepstral coefficient matrix for the output
signal (nsamp,nbasis) each column is a separate basis function, from low to
high
Return type:
tuple
Updates:
James M. Kates, 23 April 2015.
Gammawarp version to fit the basis functions, 11 February 2019.
Additive noise for IHC firing rates, 24 April 2019.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Lowpass filter and subsample the envelope in dB SL produced by the model
of the auditory periphery. The LP filter uses a von Hann raised cosine
window to ensure that there are no negative envelope values produced by
the filtering operation.
Parameters:
reference_db (np.ndarray) – env in dB SL for the ref signal in each auditory band
processed_db (np.ndarray) – env in dB SL for the degraded signal in each auditory
band
() (freq_samp) – LP filter cutoff frequency for the filtered envelope, Hz
() – subsampling frequency in Hz for the LP filtered envelopes
() – sampling rate in Hz for the signals xdB and ydB
Returns:
reference_env - LP filtered and subsampled reference signal envelope
Each frequency band is a separate column.
processed_env - LP filtered and subsampled degraded signal envelope
Return type:
tuple
Updates:
James M. Kates, 12 September 2019.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Apply a FIR modulation filterbank to the reference envelope signals
contained in matrix reference_envelope and the processed signal envelope
signals in matrix processed_envelope. Each column in reference_envelope
and processed_envelope is a separate filter band or cepstral coefficient
basis function. The modulation filters use a lowpass filter for the
lowest modulation rate, and complex demodulation followed by a lowpass
filter for the remaining bands. The onset and offset transients are
removed from the FIR convolutions to temporally align the modulation
filter outputs.
Parameters:
reference_envelope (np.ndarray) – matrix containing the subsampled reference
envelope values. Each column is a different frequency band or cepstral basis
function arranged from low to high.
processed_envelope (np.ndarray) – matrix containing the subsampled processed
envelope values
() (freq_sub_sampling) – envelope sub-sampling rate in Hz
center_frequencies (np.ndarray) – Center Frequencies
Returns:
reference_modulation (): a cell array containing the reference signal
output of the modulation filterbank. reference_modulation is of size
[nchan,nmodfilt] where nchan is the number of frequency channels or
cepstral basis functions in reference_envelope, and nmodfilt is the
number of modulation filters used in the analysis. Each cell contains a
column vector of length nsamp, where nsamp is the number of samples in
each envelope sequence contained in the columns of reference_envelope.
processed_modulation (): cell array containing the processed signal output
of the modulation filterbank.
center_frequencies (): vector of modulation rate filter center frequencies
Return type:
tuple
Updates:
James M. Kates, 14 February 2019.
Two matrix version of gwarp_ModFiltWindow, 19 February 2019.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Compute the cross-correlations between the input signal time-frequency
envelope and the distortion time-frequency envelope. The cepstral
coefficients or envelopes in each frequency band have been passed
through the modulation filterbank using function ebm_ModFilt.
Parameters:
reference_modulation (np.array) – cell array containing the reference signal
output of the modulation filterbank. Xmod is of size [nchan,nmodfilt] where
nchan is the number of frequency channels or cepstral basis functions in
Xenv, and nmodfilt is the number of modulation filters used in the analysis.
Each cell contains a column vector of length nsamp, where nsamp is the
number of samples in each envelope sequence contained in the columns of
Xenv.
processed_modulation (np.ndarray) – subsampled distorted output signal envelope
Output:
float: aveCM modulation correlations averaged over basis functions 2-6
vector of size nmodfilt
Updates:
James M. Kates, 21 February 2019.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Compute the HASPI intelligibility index using the
auditory model followed by computing the envelope cepstral
correlation and BM vibration high-level covariance. The reference
signal presentation level for NH listeners is assumed to be 65 dB
SPL. The same model is used for both normal and impaired hearing. This
version of HASPI uses a modulation filterbank followed by an ensemble of
neural networks to compute the estimated intelligibility.
NB - The original HASPI model derivation included a bug which meant that
although the ‘shift’ parameter used in band centre frequency calculations was set to
‘0.02’ it was never actually applied. To replicate this behaviour ear_model is
called with ‘shift’ set to None. For discussion please refer to the discussion in
Issue #105 <https://github.com/claritychallenge/clarity/issues/105>
for further details.
Parameters:
reference (np.ndarray) – Clear input reference speech signal with no noise or
distortion. If a hearing loss is specified, no amplification should be
provided.
reference_sample_rate (int) – Sampling rate in Hz for signal x
processed (np.ndarray) – Output signal with noise, distortion, HA gain, and/or
processing.
processed_sample_rate (int) – Sampling rate in Hz for signal y.
hearing_loss (np.ndarray) – (1,6) vector of hearing loss at the 6 audiometric
frequencies [250, 500, 1000, 2000, 4000, 6000] Hz.
level1 (int) – Optional input specifying level in dB SPL that corresponds to a
signal RMS = 1. Default is 65 dB SPL if argument not provided.
f_lp (int)
itype (int) – Intelligibility model
Returns:
float, raw: nd-array)
Intel: Intelligibility estimated by passing the cepstral coefficients
through a modulation filterbank followed by an ensemble of
neural networks.
raw: vector of 10 cep corr modulation filterbank outputs, averaged
over basis functions 2-6.
Return type:
tuple(Intel
Updates:
James M. Kates, 5 August 2013.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Provide the weights derived for the ensemble of ten neural
networks used for the HASPI_v2 intelligibility model. The neural networks
have ten inputs, 4 neurons in the hidden layer, and one output neuron.
The logsig activation function is used.
Arguments: None
Returned values:
neural_net_params (dict): parameters defining the neural network
weights_hidden (): cell array 10 x 1 for the weights linking the input to the
hidden layer. Each cell is a 11 x 4 matrix of weights
weights_out (): call array 5 x 1 for the weights linking the hidden to the
output layer. Each cell is a 5 x 1 vector of weights.
normalization_factor (): normalization so that the maximum neural net output is
exactly 1.
Updates:
James M. Kates, 8 October 2019.
Version for new neural network using actual TFS scores, 24 October 2019.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Compute the outputs at each layer of a neural network given
the input to the network and the weights. The activation function is an
offset logistic function that gives either a logsig or hyperbolic
tangent; the outputs from each layer have been reduced by the offset. The
structure of the network is an input layer, one hidden layer, and an
output layer. The first values in vectors hidden and output are set to 1
by the function, and the remaining values correspond to the outputs at
each neuron in the layer.
Args:
data (np.ndarray): feature vector input to the neural network.
neural_net_params (dict): network parameters from get_neural_net().
weights_hidden (list): matrix of weights for the hidden layer.
weights_out (list): matrix of weights for the output layer.
Returns:
hidden (): vector of outputs from the hidden layer.
output (): vector of outputs from the output layer.
Updates:
James M. Kates, 26 October 2010.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Function to compute the neural network ensemble response to a set of
inputs. The neural network is defined in NNfeedforwardZ.
Args:
data (np.ndarray): array of features input to the neural network
neural_net_params (dict): vector of neural network parameters
weights_hidden (list): cell array of hidden layer weights for each network
weights_out (list): cell array of output layer weights for each network
Returns:
model neural network output vector averaged over the ensemble
Updates:
James M. Kates, 20 September 2011.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.
Compute the HASPI intelligibility index using the
auditory model followed by computing the envelope cepstral
correlation and BM vibration high-level covariance. The reference
signal presentation level for NH listeners is assumed to be 65 dB
SPL. The same model is used for both normal and impaired hearing. This
version of HASPI uses a modulation filterbank followed by an ensemble of
neural networks to compute the estimated intelligibility.
NB - The original HASPI model derivation included a bug which meant that
although the ‘shift’ parameter used in band centre frequency calculations was set to
‘0.02’ it was never actually applied. To replicate this behaviour ear_model is
called with ‘shift’ set to None. For discussion please refer to the discussion in
Issue #105 <https://github.com/claritychallenge/clarity/issues/105>
for further details.
Parameters:
reference (np.ndarray) – Clear input reference speech signal with no noise or
distortion. If a hearing loss is specified, no amplification should be
provided.
reference_sample_rate (int) – Sampling rate in Hz for signal x
processed (np.ndarray) – Output signal with noise, distortion, HA gain, and/or
processing.
processed_sample_rate (int) – Sampling rate in Hz for signal y.
hearing_loss (np.ndarray) – (1,6) vector of hearing loss at the 6 audiometric
frequencies [250, 500, 1000, 2000, 4000, 6000] Hz.
level1 (int) – Optional input specifying level in dB SPL that corresponds to a
signal RMS = 1. Default is 65 dB SPL if argument not provided.
f_lp (int)
itype (int) – Intelligibility model
Returns:
float, raw: nd-array)
Intel: Intelligibility estimated by passing the cepstral coefficients
through a modulation filterbank followed by an ensemble of
neural networks.
raw: vector of 10 cep corr modulation filterbank outputs, averaged
over basis functions 2-6.
Return type:
tuple(Intel
Updates:
James M. Kates, 5 August 2013.
Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.