clarity.evaluator.haspi package

Submodules

clarity.evaluator.haspi.eb module

Module for HASPI, HASQI, HAAQI EBs

clarity.evaluator.haspi.eb.ave_covary2(signal_cross_covariance: np.ndarray, reference_signal_mean_square: np.ndarray, threshold_db: float, lp_filter_order: ndarray | None = None, freq_cutoff: ndarray | None = None) → tuple[float, ndarray][source]

Compute the average cross-covariance between the reference and processed signals in each auditory band.

The silent time-frequency tiles are removed from consideration. The cross-covariance is computed for each segment in each frequency band. The values are weighted by 1 for inclusion or 0 if the tile is below threshold. The sum of the covariance values across time and frequency are then divided by the total number of tiles above threshold. The calculation is a modification of Tan et al.[1]_ . The cross-covariance is also output with a frequency weighting that reflects the loss of IHC synchronization at high frequencies Johnson[2]_.

Parameters:

signal_cross_covariance (np.array) – [nchan,nseg] of cross-covariance values
reference_signal_mean_square (np.array) – [nchan,nseg] of reference signal MS values
() (threshold_db) – threshold in dB SL to include segment ave over freq in average
lp_filter (list) – LP filter order
freq_cutoff (list) – Cutoff frequencies in Hz

Returns:

cross-covariance in segments averaged over time and

frequency

ihc_sync_covariance (): cross-covariance array, 6 different weightings for loss

of IHC synchronization at high frequencies:

LP Filter Order Cutoff Freq, kHz: 1 1.5 3 2.0 5 2.5, 3.0, 3.5, 4.0

Return type:

average_covariance ()

References:

Updates:: James M. Kates, 28 August 2012. Adjusted for BM vibration in dB SL, 30 October 2012. Threshold for including time-freq tile modified, 30 January 2013. Version for different sync loss, 15 February 2013. Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.

clarity.evaluator.haspi.eb.bandwidth_adjust(control: ndarray, bandwidth_min: float, bandwidth_max: float, level1: float) → float[source]

Compute the increase in auditory filter bandwidth in response to high signal levels. The RMS of the control signal, a scalar, is used to set the bandwidth for the entire signal.

Parameters:

() (level1) – envelope output in the control filter band
() – auditory filter bandwidth computed for the loss (or NH)
() – auditory filter bandwidth at maximum OHC damage
() – RMS=1 corresponds to Level1 dB SPL

Returns:

filter bandwidth increased for high signal levels

Return type:

bandwidth ()

Updates: James M. Kates, 21 June 2011. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.basilar_membrane_add_noise(reference: ndarray, threshold: int, level1: float) → ndarray[source]

Apply the IHC attenuation to the BM motion and to add a low-level Gaussian noise to give the auditory threshold.

Parameters:

() (level1) – BM motion to be attenuated
() – additive noise level in dB re:auditory threshold
() – an input having RMS=1 corresponds to Level1 dB SPL

Returns:

Attenuated signal with threshold noise added

Updates:: James M. Kates, 19 June 2012. Just additive noise, 2 Oct 2012. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.bm_covary(reference_basilar_membrane: ndarray, processed_basilar_membrane: ndarray, segment_size: int, sample_rate: float) → tuple[ndarray, ndarray, ndarray][source]

Compute the cross-covariance (normalized cross-correlation) between the reference and processed signals in each auditory band. The signals are divided into segments having 50% overlap.

Parameters:

() (segment_size) – Basilar Membrane movement, reference signal
() – Basilar Membrane movement, processed signal
() – signal segment size, msec
freq_sample (int) – sampling rate in Hz

Returns:

[nchan,nseg] of cross-covariance values reference_mean_square (np.array) : [nchan,nseg] of MS input signal energy values processed_mean_square (np.array) : [nchan,nseg] of MS processed signal energy

values

Return type:

signal_cross_covariance (np.array)

Updates:: James M. Kates, 28 August 2012. Output amplitude adjustment added, 30 october 2012. Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.

clarity.evaluator.haspi.eb.center_frequency(nchan: int, shift: float | None = None, low_freq: int = 80, high_freq: int = 8000, ear_q: float = 9.26449, min_bw: float = 24.7) → ndarray[source]

Compute the Equivalent Rectangular Bandwidth_[1] frequency spacing for the gammatone filter bank. The equation comes from Malcolm Slaney[2].

Parameters:

nchan (int) – number of filters in the filter bank
low_freq (int) – Low Frequency level.
high_freq (int) – High Frequency level.
() (shift) – optional frequency shift of the filter bank specified as a fractional shift in distance along the BM. A positive shift is an increase in frequency (basal shift), and negative is a decrease in frequency (apical shift). The total length of the BM is normalized to 1. The frequency-to-distance map is from D.D. Greenwood[3].
ear_q (float)
min_bw (float)

Returns:

References: .. [1] Moore BCJ, Glasberg BR (1983) Suggested formulae for calculating

auditory-filter bandwidths and excitation patterns. J Acoustical Soc America 74:750-753. Available at <https://doi.org/10.1121/1.389861>

Updates: James M. Kates, 25 January 2007. Frequency shift added 22 August 2008. Lower and upper frequencies fixed at 80 and 8000 Hz, 19 June 2012. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.convert_rms_to_sl(reference: ndarray, control: ndarray, attenuated_ohc: ndarray | float, threshold_low: ndarray | int, compression_ratio: ndarray | int, attenuated_ihc: ndarray | float, level1: float, threshold_high: int = 100, small: float = 1e-30) → ndarray[source]

Covert the Root Mean Square average output of the gammatone filter bank into dB SL. The gain is linear below the lower threshold, compressive with a compression ratio of CR:1 between the lower and upper thresholds, and reverts to linear above the upper threshold. The compressor assumes that auditory threshold is 0 dB SPL.

Parameters:

() (level1) – analytic signal envelope (magnitude) returned by the
bank (gammatone filter)
level (RMS average)
() – control signal envelope
() – OHC attenuation at the input to the compressor
() – kneepoint for the low-level linear amplification
() – compression ratio
() – IHC attenuation at the input to the synapse
() – dB reference level: a signal having an RMS value of 1 is assigned to Level1 dB SPL.
threshold_high (int)
small (float)

Returns:

compressed output in dB above the impaired threshold

Return type:

reference_db ()

Updates:: James M. Kates, 6 August 2007. Version for two-tone suppression, 29 August 2008. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.ear_model(reference: ndarray, reference_freq: float, processed: ndarray, processed_freq: float, hearing_loss: ndarray, itype: int, level1: float, nchan: int = 32, m_delay: int = 1, shift: float | None = None) → tuple[ndarray, ndarray, ndarray, ndarray, ndarray, ndarray, float][source]

Function that implements a cochlear model that includes the middle ear, auditory filter bank, Outer Hair Cell (OHC) dynamic-range compression, and Inner Hair Cell (IHC) attenuation.

The inputs are the reference and processed signals that are to be compared. The reference is at the reference intensity (e.g. 65 dB SPL or with NAL-R amplification) and has no other processing. The processed signal is the hearing-aid output, and is assumed to have the same or greater group delay compared to the reference.

The function outputs the envelopes of the signals after OHC compression and IHC loss attenuation.

Parameters:

reference (np.ndarray) – reference signal: should be adjusted to 65 dB SPL (itype=0 or 1) or to 65 dB SPL plus NAL-R gain (itype=2)
reference_freq (int) – sampling rate for the reference signal, Hz
processed (np.ndarray) – processed signal (e.g. hearing-aid output) includes HA gain
processed_freq (int) – sampling rate for the processed signal, Hz
hearing_loss (np.ndarray) – audiogram giving the hearing loss in dB at 6 audiometric frequencies: [250, 500, 1000, 2000, 4000, 6000] Hz
itype (int) –
purpose for the calculation: 0=intelligibility: reference is normal hearing and must not

include NAL-R EQ

1=quality: reference does not include NAL-R EQ 2=quality: reference already has NAL-R EQ applied
level1 – level calibration: signal RMS=1 corresponds to Level1 dB SPL
nchan (int) – auditory frequency bands
m_delay (int) – Compensate for the gammatone group delay.
shift (float) – Basal shift of the basilar membrane length

Returns:

envelope for the reference in each band reference_basilar_membrane (): BM motion for the reference in each band processed_db (): envelope for the processed signal in each band processed_basilar_membrane (): BM motion for the processed signal in each band reference_sl (): compressed RMS average reference in each band converted

to dB SL

processed_sl (): compressed RMS average output in each band converted to dB SL freq_sample (): sampling rate in Hz for the model outputs

Return type:

reference_db ()

Updates: James M. Kates, 27 October 2011. Basilar Membrane added 30 Dec 2011. Revised 19 June 2012. Remove match of reference RMS level to processed 29 August 2012. IHC adaptation added 1 October 2012. Basilar Membrane envelope converted to dB SL, 2 Oct 2012. Filterbank group delay corrected, 14 Dec 2012. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022. Updated by Gerardo Roa Dabike, September 2022.

clarity.evaluator.haspi.eb.env_compress_basilar_membrane(envsig: ndarray, bm: ndarray, control: ndarray, attn_ohc: float, threshold_low: float, compression_ratio: float, fsamp: float, level1: float, small: float = 1e-30, threshold_high: int = 100) → tuple[ndarray, ndarray][source]

Compute the cochlear compression in one auditory filter band. The gain is linear below the lower threshold, compressive with a compression ratio of CR:1 between the lower and upper thresholds, and reverts to linear above the upper threshold. The compressor assumes that auditory threshold is 0 dB SPL.

Parameters:

() (small) – analytic signal envelope (magnitude) returned by the gammatone filter bank
() – BM motion output by the filter bank
() – analytic control envelope returned by the wide control path filter bank
() – OHC attenuation at the input to the compressor
() – kneepoint for the low-level linear amplification
() – compression ratio
() – sampling rate in Hz
() – dB reference level: a signal having an RMS value of 1 is assigned to Level1 dB SPL.
() –
???
threshold_high – kneepoint for the high-level linear amplification

Returns:

compressed version of the signal envelope compressed_basilar_membrane (): compressed version of the BM motion

Return type:

compressed_signal ()

Updates: James M. Kates, 19 January 2007. LP filter added 15 Feb 2007 (Ref: Zhang et al., 2001) Version to compress the envelope, 20 Feb 2007. Change in the OHC I/O function, 9 March 2007. Two-tone suppression added 22 August 2008. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.env_smooth(envelopes: np.ndarray, segment_size: int, sample_rate: float) → ndarray[source]

Function to smooth the envelope returned by the cochlear model. The envelope is divided into segments having a 50% overlap. Each segment is windowed, summed, and divided by the window sum to produce the average. A raised cosine window is used. The envelope sub-sampling frequency is 2*(1000/segsize).

Parameters:

envelopes (np.ndarray) – matrix of envelopes in each of the auditory bands
segment_size – averaging segment size in msec
freq_sample (int) – input envelope sampling rate in Hz

Returns:

matrix of subsampled windowed averages in each band

Return type:

smooth

Updates:: James M. Kates, 26 January 2007. Final half segment added 27 August 2012. Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.

clarity.evaluator.haspi.eb.envelope_align(reference: ndarray, output: ndarray, freq_sample: int = 24000, corr_range: int = 100) → ndarray[source]

Align the envelope of the processed signal to that of the reference signal.

Parameters:

() (output) – envelope or BM motion of the reference signal
() – envelope or BM motion of the output signal
freq_sample (int) – Frequency sample rate in Hz
corr_range (int) – range in msec for the correlation

Returns:

shifted output envelope to match the input

Return type:

y ()

Updates: James M. Kates, 28 October 2011. Absolute value of the cross-correlation peak removed, 22 June 2012. Cross-correlation range reduced, 13 August 2013. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.envelope_sl(reference: ndarray, basilar_membrane: ndarray, attenuated_ihc: float, level1: float, small: float = 1e-30) → tuple[ndarray, ndarray][source]

Convert the compressed envelope returned by cochlear_envcomp to dB SL.

Parameters:

() (level1) – linear envelope after compression
() – linear Basilar Membrane vibration after compression
() – IHC attenuation at the input to the synapse
() – level in dB SPL corresponding to 1 RMS
small (float) –
???

Returns:

reference envelope in dB SL _basilar_membrane (): Basilar Membrane vibration with envelope converted to

dB SL

Return type:

_reference ()

Updates: James M. Kates, 20 Feb 07. IHC attenuation added 9 March 2007. Basilar membrane vibration conversion added 2 October 2012. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.gammatone_bandwidth_demodulation(npts: int, tpt: float, center_freq: float, center_freq_cos: ndarray, center_freq_sin: ndarray) → tuple[ndarray, ndarray][source]

Create the carriers for demodulaton, using the 2d Rotation method from: https://ccrma.stanford.edu/~jos/pasp/Digital_Sinusoid_Generators.html

to generate the sin and cos components. More efficient, perhaps, than calculating the sin and cos at each point in time.

Parameters:

() (center_freq_sin) – How many points are needed.
() – Phase change (2pi/T) due to each sample time.
() – The carrier frequency
() – Array to overwrite for the output.
() – Array to overwrite for the output.

Returns:

Samples of the carrier frequency in sin phase. coscf (): Samples of the carrier frequency in cos phase.

Return type:

sincf ()

clarity.evaluator.haspi.eb.gammatone_basilar_membrane(reference: ndarray, reference_bandwidth: float, processed: ndarray, processed_bandwidth: float, freq_sample: float, center_freq: float, ear_q: float = 9.26449, min_bandwidth: float = 24.7) → tuple[ndarray, ndarray, ndarray, ndarray][source]

4th-order gammatone auditory filter. This implementation is based on the c program published on-line by Ning Ma, U. Sheffield, UK[1]_ that gives an implementation of the Martin Cooke filters[2]_: an impulse-invariant transformation of the gammatone filter. The signal is demodulated down to baseband using a complex exponential, and then passed through a cascade of four one-pole low-pass filters.

This version filters two signals that have the same sampling rate and the same gammatone filter center frequencies. The lengths of the two signals should match; if they don’t, the signals are truncated to the shorter of the two lengths.

Parameters:

() (freq_sample) – first sequence to be filtered
reference_bandwidth – bandwidth for x relative to that of a normal ear
() – second sequence to be filtered
() – bandwidth for x relative to that of a normal ear
() – sampling rate in Hz
center_frequency (int) – filter center frequency in Hz
ear_q – (float): ???
min_bandwidth (float) –
???

Returns:

filter envelope output (modulated down to baseband): 1st signal

reference_basilar_membrane (): Basilar Membrane for the first signal processed_envelope (): filter envelope output (modulated down to baseband)

2nd signal

processed_basilar_membrane (): Basilar Membrane for the second signal

Return type:

reference_envelope ()

References: .. [1] Ma N, Green P, Barker J, Coy A (2007) Exploiting correlogram

structure for robust speech recognition with multiple speech sources. Speech Communication, 49 (12): 874-891. Available at <https://doi.org/10.1016/j.specom.2007.05.003> <https://staffwww.dcs.shef.ac.uk/people/N.Ma/resources/gammatone/>

Updates: James M. Kates, 8 Jan 2007. Vectorized version for efficient MATLAB execution, 4 February 2007. Cosine and sine generation, 29 June 2011. Output sine and cosine sequences, 19 June 2012. Cosine/sine loop speed increased, 9 August 2013. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.group_delay_compensate(reference: ndarray, bandwidths: ndarray, center_freq: ndarray, freq_sample: float, ear_q: float = 9.26449, min_bandwidth: float = 24.7) → ndarray[source]

Compensate for the group delay of the gammatone filter bank. The group delay is computed for each filter at its center frequency. The firing rate output of the IHC model is then adjusted so that all outputs have the same group delay.

Parameters:

xenv (np.ndarray) – matrix of signal envelopes or BM motion
() (freq_sample) – gammatone filter bandwidths adjusted for loss
() – center frequencies of the bands
() – sampling rate for the input signal in Hz (e.g. 24,000 Hz)
ear_q (float)
min_bandwidth (float)

Returns:

envelopes or BM motion compensated for the group delay.

Return type:

processed ()

Updates:: James M. Kates, 28 October 2011. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.inner_hair_cell_adaptation(reference_db, reference_basilar_membrane, delta, freq_sample)[source]

Provide inner hair cell (IHC) adaptation. The adaptation is based on an equivalent RC circuit model, and the derivatives are mapped into 1st-order backward differences. Rapid and short-term adaptation are provided. The input is the signal envelope in dB SL, with IHC attenuation already applied to the envelope. The outputs are the envelope in dB SL with adaptation providing overshoot of the long-term output level, and the BM motion is multiplied by a gain vs. time function that reproduces the adaptation. IHC attenuation and additive noise for the equivalent auditory threshold are provided by a subsequent call to eb_BMatten.

Parameters:

reference_db (np.ndarray) – signal envelope in one frequency band in dB SL contains OHC compression and IHC attenuation
() (delta) – basilar membrane vibration with OHC compression but no IHC attenuation
() – overshoot factor = delta x steady-state
freq_sample (int) – sampling rate in Hz

Returns:

envelope in dB SL with IHC adaptation output_basilar_membrane (): Basilar Membrane multiplied by the IHC adaptation

gain function

Return type:

output_db ()

Updates: James M. Kates, 1 October 2012. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.input_align(reference: ndarray, processed: ndarray) → tuple[ndarray, ndarray][source]

Approximate temporal alignment of the reference and processed output signals. Leading and trailing zeros are then pruned.

The function assumes that the two sequences have the same sampling rate: call eb_Resamp24kHz for each sequence first, then call this function to align the signals.

Arguments: reference (np.ndarray): input reference sequence processed (np.ndarray): hearing-aid output sequence

Returns: reference (np.ndarray): pruned and shifted reference processed (np.ndarray): pruned and shifted hearing-aid output

Updates: James M. Kates, 12 July 2011. Match the length of the processed output to the reference for the purposes of computing the cross-covariance Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.loss_parameters(hearing_loss: ndarray, center_freq: ndarray, audiometric_freq: ndarray | None = None) → tuple[ndarray, ndarray, ndarray, ndarray, ndarray][source]

Apportion the hearing loss to the outer hair cells (OHC) and the inner hair cells (IHC) and to increase the bandwidth of the cochlear filters in proportion to the OHC fraction of the total loss.

Parameters:

hearing_loss (np.ndarray) – hearing loss at the 6 audiometric frequencies
center_freq (np.ndarray) – array containing the center frequencies of the gammatone filters arranged from low to high
audiometric_freq (list)

Returns:

attenuation in dB for the OHC gammatone filters bandwidth (): OHC filter bandwidth expressed in terms of normal low_knee (): Lower kneepoint for the low-level linear amplification compression_ratio (): Ranges from 1.4:1 at 150 Hz to 3.5:1 at 8 kHz for normal

hearing. Reduced in proportion to the OHC loss to 1:1.

attenuated_ihc (): attenuation in dB for the input to the IHC synapse

Return type:

attenuated_ohc ()

Updates: James M. Kates, 25 January 2007. Version for loss in dB and match of OHC loss to CR, 9 March 2007. Low-frequency extent changed to 80 Hz, 27 Oct 2011. Lower kneepoint set to 30 dB, 19 June 2012. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.mel_cepstrum_correlation(reference: ndarray, distorted: ndarray, threshold: float, addnoise: float) → tuple[float, ndarray][source]

Compute the cross-correlations between the input signal time-frequency envelope and the distortion time-frequency envelope.

For each time interval, the log spectrum is fitted with a set of half-cosine basis functions. The spectrum weighted by the basis functions corresponds to Mel Cepstral Coefficients computed in the frequency domain. The amplitude-normalized cross-covariance between the time-varying basis functions for the input and output signals is then computed.

Parameters:

() (addnoise) – subsampled input signal envelope in dB SL in each critical band
() – subsampled distorted output signal envelope
() – threshold in dB SPL to include segment in calculation
() – additive Gaussian noise to ensure 0 cross-corr at low levels

Returns:

average cepstral correlation 2-6, input vs output individual_cepstral_correlations : individual cepstral correlations,

input vs output

Return type:

average_cepstral_correlation

Updates:: James M. Kates, 24 October 2006. Difference signal removed for cochlear model, 31 January 2007. Absolute value added 13 May 2011. Changed to loudness criterion for silence threshsold, 28 August 2012. Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.

clarity.evaluator.haspi.eb.melcor9(reference: ndarray, distorted: ndarray, threshold: float, add_noise: float, segment_size: int, n_cepstral_coef: int = 6) → tuple[float, float, float, ndarray][source]

Compute the cross-correlations between the input signal time-frequency envelope and the distortion time-frequency envelope. For each time interval, the log spectrum is fitted with a set of half-cosine basis functions. The spectrum weighted by the basis functions corresponds to mel cepstral coefficients computed in the frequency domain. The amplitude-normalized cross-covariance between the time-varying basis functions for the input and output signals is then computed for each of the 8 modulation frequencies.

Parameters:

() (segment_size) – subsampled input signal envelope in dB SL in each critical band
() – subsampled distorted output signal envelope
() – threshold in dB SPL to include segment in calculation
() – additive Gaussian noise to ensure 0 cross-corr at low levels
() – segment size in ms used for the envelope LP filter (8 msec)
n_cepstral_coef (int) – Number of cepstral coefficients

Returns:

average of the modulation correlations across analysis: frequency bands and modulation frequency bands, basis functions 2 -6

mel_cepstral_low (): average over the four lower mod freq bands, 0 - 20 Hz mel_cepstral_high (): average over the four higher mod freq bands, 20 - 125 Hz mel_cepstral_modulation (): vector of cross-correlations by modulation

frequency, averaged over analysis frequency band

Return type:

mel_cepstral_average ()

Updates:: James M. Kates, 24 October 2006. Difference signal removed for cochlear model, 31 January 2007. Absolute value added 13 May 2011. Changed to loudness criterion for silence threshold, 28 August 2012. Version using envelope modulation filters, 15 July 2014. Modulation frequency vector output added 27 August 2014. Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.

clarity.evaluator.haspi.eb.melcor9_crosscovmatrix(b: ndarray, nmod: int, nbasis: int, nsamp: int, nfir: int, reference_cep: ndarray, processed_cep: ndarray) → ndarray[source]

Compute the cross-covariance matrix.

Parameters:

() (ycep) –
???
() –
???
() –
???
() –
???
() –
???
() –
???
() –
???

Return type:

cross_covariance_matrix ()

clarity.evaluator.haspi.eb.middle_ear(reference: ndarray, freq_sample: float) → ndarray[source]

Design the middle ear filters and process the input through the cascade of filters. The middle ear model is a 2-pole HP filter at 350 Hz in series with a 1-pole LP filter at 5000 Hz. The result is a rough approximation to the equal-loudness contour at threshold.

Arguments: reference (np.ndarray): input signal freq_sample (float): sampling rate in Hz

Returns: xout (): filtered output

Updates: James M. Kates, 18 January 2007. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.resample_24khz(reference_signal: ndarray, reference_freq: float, freq_sample_hz: float = 24000.0) → tuple[ndarray, float][source]

Resample the input signal at 24 kHz. The input sampling rate is rounded to the nearest kHz to compute the sampling rate conversion ratio.

Arguments: reference_signal (np.ndarray): input signal reference_freq (int): sampling rate for the input in Hz freq_sample_hz (int): Frequency sample in Hz

Returns: reference_signal_24 signal resampled at kHz (default 24Khz) freq_sample_hz output sampling rate in Hz

Updates James M. Kates, 20 June 2011. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.eb.spectrum_diff(reference_sl: ndarray, processed_sl: ndarray) → tuple[ndarray, ndarray, ndarray][source]

Compute changes in the long-term spectrum and spectral slope.

The metric is based on the spectral distortion metric of Moore and Tan[1]_ (JAES, Vol 52, pp 900-914). The log envelopes in dB SL are converted to linear to approximate specific loudness. The outputs are the sum of the absolute differences, the standard deviation of the differences, and the maximum absolute difference. The same three outputs are provided for the normalized spectral difference and for the slope. The output is calibrated so that a processed signal having 0 amplitude produces a value of 1 for the spectrum difference.

Abs diff: weight all deviations uniformly Std diff: weight larger deviations more than smaller deviations Max diff: only weight the largest deviation

Parameters:

reference_sl (np.ndarray) – reference signal spectrum in dB SL
processed_sl (np.ndarray) – degraded signal spectrum in dB SL

Returns:

[sum abs diff, std dev diff, max diff] spectra dnorm (np.array) : [sum abs diff, std dev diff, max diff] norm spectra dslope (np.array) : [sum abs diff, std dev diff, max diff] slope

Return type:

dloud (np.array)

References: .. [1] Moore BCJ, Tan, CT (2004) Development and Validation of a Method

for Predicting the Perceived Naturalness of Sounds Subjected to Spectral Distortion J Audio Eng Soc 52(9):900-914. Available at. <http://www.aes.org/e-lib/browse.cfm?elib=13018>.

Updates:: James M. Kates, 28 June 2012. Translated from MATLAB to Python by Gerardo Roa Dabike, September 2022.

clarity.evaluator.haspi.ebm module

HASPI EBM module

clarity.evaluator.haspi.ebm.add_noise(reference_db: ndarray, thresh_db: float) → ndarray[source]

Add independent random Gaussian noise to the subsampled signal envelope in each auditory frequency band.

Parameters:

() (thresh_db) – subsampled envelope in dB re:auditory threshold
() – additive noise RMS level (in dB)

Returns:

auditory threshold

Return type:

() envelope with threshold noise added, in dB re

Updates:: James M. Kates, 23 April 2019. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.ebm.cepstral_correlation_coef(reference_db: ndarray, processed_db: ndarray, thresh_cep: float, thresh_nerve: float, nbasis: int) → tuple[ndarray, ndarray][source]

Compute the cepstral correlation coefficients between the reference signal and the distorted signal log envelopes. The silence portions of the signals are removed prior to the calculation based on the envelope of the reference signal. For each time sample, the log spectrum in dB SL is fitted with a set of half-cosine basis functions. The cepstral coefficients then form the input to the cepstral correlation calculation.

Parameters:

() (thresh_nerve) – subsampled reference signal envelope in dB SL in each band
() – subsampled distorted output signal envelope
() – threshold in dB SPL to include sample in calculation
() – additive noise RMS for IHC firing (in dB)
nbasis – number of cepstral basis functions to use

Returns:

refernce_cep cepstral coefficient matrix for the ref signal: (nsamp,nbasis) processed_cep cepstral coefficient matrix for the output signal (nsamp,nbasis) each column is a separate basis function, from low to high

Return type:

tuple

Updates:: James M. Kates, 23 April 2015. Gammawarp version to fit the basis functions, 11 February 2019. Additive noise for IHC firing rates, 24 April 2019.

Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.ebm.env_filter(reference_db: ndarray, processed_db: ndarray, filter_cutoff: float, freq_sub_sample: float, freq_samp: float) → tuple[ndarray, ndarray][source]

Lowpass filter and subsample the envelope in dB SL produced by the model of the auditory periphery. The LP filter uses a von Hann raised cosine window to ensure that there are no negative envelope values produced by the filtering operation.

Parameters:

reference_db (np.ndarray) – env in dB SL for the ref signal in each auditory band
processed_db (np.ndarray) – env in dB SL for the degraded signal in each auditory band
() (freq_samp) – LP filter cutoff frequency for the filtered envelope, Hz
() – subsampling frequency in Hz for the LP filtered envelopes
() – sampling rate in Hz for the signals xdB and ydB

Returns:

reference_env - LP filtered and subsampled reference signal envelope: Each frequency band is a separate column. processed_env - LP filtered and subsampled degraded signal envelope

Return type:

tuple

Updates:: James M. Kates, 12 September 2019. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.ebm.fir_modulation_filter(reference_envelope: ndarray, processed_envelope: ndarray, freq_sub_sampling: float, center_frequencies: ndarray | None = None) → tuple[ndarray, ndarray, ndarray][source]

Apply a FIR modulation filterbank to the reference envelope signals contained in matrix reference_envelope and the processed signal envelope signals in matrix processed_envelope. Each column in reference_envelope and processed_envelope is a separate filter band or cepstral coefficient basis function. The modulation filters use a lowpass filter for the lowest modulation rate, and complex demodulation followed by a lowpass filter for the remaining bands. The onset and offset transients are removed from the FIR convolutions to temporally align the modulation filter outputs.

Parameters:

reference_envelope (np.ndarray) – matrix containing the subsampled reference envelope values. Each column is a different frequency band or cepstral basis function arranged from low to high.
processed_envelope (np.ndarray) – matrix containing the subsampled processed envelope values
() (freq_sub_sampling) – envelope sub-sampling rate in Hz
center_frequencies (np.ndarray) – Center Frequencies

Returns:

reference_modulation (): a cell array containing the reference signal: output of the modulation filterbank. reference_modulation is of size [nchan,nmodfilt] where nchan is the number of frequency channels or cepstral basis functions in reference_envelope, and nmodfilt is the number of modulation filters used in the analysis. Each cell contains a column vector of length nsamp, where nsamp is the number of samples in each envelope sequence contained in the columns of reference_envelope.
processed_modulation (): cell array containing the processed signal output: of the modulation filterbank.

center_frequencies (): vector of modulation rate filter center frequencies

Return type:

tuple

Updates:: James M. Kates, 14 February 2019. Two matrix version of gwarp_ModFiltWindow, 19 February 2019. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.ebm.modulation_cross_correlation(reference_modulation: ndarray, processed_modulation: ndarray) → ndarray[source]

Compute the cross-correlations between the input signal time-frequency envelope and the distortion time-frequency envelope. The cepstral coefficients or envelopes in each frequency band have been passed through the modulation filterbank using function ebm_ModFilt.

Parameters:

reference_modulation (np.array) – cell array containing the reference signal output of the modulation filterbank. Xmod is of size [nchan,nmodfilt] where nchan is the number of frequency channels or cepstral basis functions in Xenv, and nmodfilt is the number of modulation filters used in the analysis. Each cell contains a column vector of length nsamp, where nsamp is the number of samples in each envelope sequence contained in the columns of Xenv.
processed_modulation (np.ndarray) – subsampled distorted output signal envelope

Output:

float: aveCM modulation correlations averaged over basis functions 2-6: vector of size nmodfilt

Updates:

James M. Kates, 21 February 2019. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.haspi module

HASPI intelligibility Index

clarity.evaluator.haspi.haspi.haspi_v2(reference: ndarray, reference_sample_rate: float, processed: ndarray, processed_sample_rate: float, audiogram: Audiogram, level1: float = 65.0, f_lp: float = 320.0, itype: int = 0) → tuple[float, ndarray][source]

Compute the HASPI intelligibility index using the auditory model followed by computing the envelope cepstral correlation and BM vibration high-level covariance. The reference signal presentation level for NH listeners is assumed to be 65 dB SPL. The same model is used for both normal and impaired hearing. This version of HASPI uses a modulation filterbank followed by an ensemble of neural networks to compute the estimated intelligibility.

NB - The original HASPI model derivation included a bug which meant that although the ‘shift’ parameter used in band centre frequency calculations was set to ‘0.02’ it was never actually applied. To replicate this behaviour ear_model is called with ‘shift’ set to None. For discussion please refer to the discussion in Issue #105 <https://github.com/claritychallenge/clarity/issues/105> for further details.

Parameters:

reference (np.ndarray) – Clear input reference speech signal with no noise or distortion. If a hearing loss is specified, no amplification should be provided.
reference_sample_rate (int) – Sampling rate in Hz for signal x
processed (np.ndarray) – Output signal with noise, distortion, HA gain, and/or processing.
processed_sample_rate (int) – Sampling rate in Hz for signal y.
hearing_loss (np.ndarray) – (1,6) vector of hearing loss at the 6 audiometric frequencies [250, 500, 1000, 2000, 4000, 6000] Hz.
level1 (int) – Optional input specifying level in dB SPL that corresponds to a signal RMS = 1. Default is 65 dB SPL if argument not provided.
f_lp (int)
itype (int) – Intelligibility model

Returns:

float, raw: nd-array) Intel: Intelligibility estimated by passing the cepstral coefficients

through a modulation filterbank followed by an ensemble of neural networks.

raw: vector of 10 cep corr modulation filterbank outputs, averaged: over basis functions 2-6.

Return type:

tuple(Intel

Updates:: James M. Kates, 5 August 2013. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.haspi.haspi_v2_be(reference_left: ndarray, reference_right: ndarray, processed_left: ndarray, processed_right: ndarray, sample_rate: float, listener: Listener, level: float = 100.0) → float[source]

Better ear HASPI.

Calculates HASPI for left and right ear and selects the better result.

Parameters:

ref_left (np.ndarray) – left channel of reference signal
ref_right (np.ndarray) – right channel of reference signal
proc_left (np.ndarray) – left channel of processed signal
proc_right (np.ndarray) – right channel of processed signal
sample_rate (int) – sampling rate for both signal
() (audiogram_right) – left ear audiogram
() – right ear audiogram
level – level in dB SPL corresponding to RMS=1

Returns:

beHASPI score

Return type:

float

Updates:: Zuzanna Podwinska, March 2022

clarity.evaluator.haspi.ip module

Functions for HASPI neural network stage.

clarity.evaluator.haspi.ip.get_neural_net() → tuple[dict, list[ndarray], list[ndarray], float][source]

Provide the weights derived for the ensemble of ten neural networks used for the HASPI_v2 intelligibility model. The neural networks have ten inputs, 4 neurons in the hidden layer, and one output neuron. The logsig activation function is used.

Arguments: None

Returned values:

neural_net_params (dict): parameters defining the neural network weights_hidden (): cell array 10 x 1 for the weights linking the input to the hidden layer. Each cell is a 11 x 4 matrix of weights weights_out (): call array 5 x 1 for the weights linking the hidden to the

output layer. Each cell is a 5 x 1 vector of weights.

normalization_factor (): normalization so that the maximum neural net output is: exactly 1.

Updates:

James M. Kates, 8 October 2019. Version for new neural network using actual TFS scores, 24 October 2019. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.ip.nn_feed_forward(data: ndarray, neural_net_params: dict, weights_hidden: ndarray, weights_out: ndarray) → tuple[ndarray, ndarray][source]

Compute the outputs at each layer of a neural network given the input to the network and the weights. The activation function is an offset logistic function that gives either a logsig or hyperbolic tangent; the outputs from each layer have been reduced by the offset. The structure of the network is an input layer, one hidden layer, and an output layer. The first values in vectors hidden and output are set to 1 by the function, and the remaining values correspond to the outputs at each neuron in the layer.

Args: data (np.ndarray): feature vector input to the neural network. neural_net_params (dict): network parameters from get_neural_net(). weights_hidden (list): matrix of weights for the hidden layer. weights_out (list): matrix of weights for the output layer.

Returns: hidden (): vector of outputs from the hidden layer. output (): vector of outputs from the output layer.

Updates: James M. Kates, 26 October 2010. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.ip.nn_feed_forward_ensemble(data: ndarray, neural_net_params: dict, weights_hidden: list[ndarray], weights_out: list[ndarray]) → ndarray[source]

Function to compute the neural network ensemble response to a set of inputs. The neural network is defined in NNfeedforwardZ.

Args: data (np.ndarray): array of features input to the neural network neural_net_params (dict): vector of neural network parameters weights_hidden (list): cell array of hidden layer weights for each network weights_out (list): cell array of output layer weights for each network

Returns: model neural network output vector averaged over the ensemble

Updates: James M. Kates, 20 September 2011. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

Module contents

HASPI intelligibility index.

clarity.evaluator.haspi.haspi_v2(reference: ndarray, reference_sample_rate: float, processed: ndarray, processed_sample_rate: float, audiogram: Audiogram, level1: float = 65.0, f_lp: float = 320.0, itype: int = 0) → tuple[float, ndarray][source]

Compute the HASPI intelligibility index using the auditory model followed by computing the envelope cepstral correlation and BM vibration high-level covariance. The reference signal presentation level for NH listeners is assumed to be 65 dB SPL. The same model is used for both normal and impaired hearing. This version of HASPI uses a modulation filterbank followed by an ensemble of neural networks to compute the estimated intelligibility.

NB - The original HASPI model derivation included a bug which meant that although the ‘shift’ parameter used in band centre frequency calculations was set to ‘0.02’ it was never actually applied. To replicate this behaviour ear_model is called with ‘shift’ set to None. For discussion please refer to the discussion in Issue #105 <https://github.com/claritychallenge/clarity/issues/105> for further details.

Parameters:

reference (np.ndarray) – Clear input reference speech signal with no noise or distortion. If a hearing loss is specified, no amplification should be provided.
reference_sample_rate (int) – Sampling rate in Hz for signal x
processed (np.ndarray) – Output signal with noise, distortion, HA gain, and/or processing.
processed_sample_rate (int) – Sampling rate in Hz for signal y.
hearing_loss (np.ndarray) – (1,6) vector of hearing loss at the 6 audiometric frequencies [250, 500, 1000, 2000, 4000, 6000] Hz.
level1 (int) – Optional input specifying level in dB SPL that corresponds to a signal RMS = 1. Default is 65 dB SPL if argument not provided.
f_lp (int)
itype (int) – Intelligibility model

Returns:

float, raw: nd-array) Intel: Intelligibility estimated by passing the cepstral coefficients

through a modulation filterbank followed by an ensemble of neural networks.

raw: vector of 10 cep corr modulation filterbank outputs, averaged: over basis functions 2-6.

Return type:

tuple(Intel

Updates:: James M. Kates, 5 August 2013. Translated from MATLAB to Python by Zuzanna Podwinska, March 2022.

clarity.evaluator.haspi.haspi_v2_be(reference_left: ndarray, reference_right: ndarray, processed_left: ndarray, processed_right: ndarray, sample_rate: float, listener: Listener, level: float = 100.0) → float[source]

Better ear HASPI.

Calculates HASPI for left and right ear and selects the better result.

Parameters:

ref_left (np.ndarray) – left channel of reference signal
ref_right (np.ndarray) – right channel of reference signal
proc_left (np.ndarray) – left channel of processed signal
proc_right (np.ndarray) – right channel of processed signal
sample_rate (int) – sampling rate for both signal
() (audiogram_right) – left ear audiogram
() – right ear audiogram
level – level in dB SPL corresponding to RMS=1

Returns:

beHASPI score

Return type:

float

Updates:: Zuzanna Podwinska, March 2022