clarity.predictor package

Submodules

clarity.predictor.torch_msbg module

An FIR-based torch implementation of approximated MSBG hearing loss model

class clarity.predictor.torch_msbg.MSBGHearingModel(audiogram: np.ndarray, audiometric: np.ndarray, sr: int = 44100, spl_cali: bool = True, src_position: str = 'ff', kernel_size: int = 1025, device: str | None = None)[source]

Bases: Module

calibrate_spl(x: Tensor) Tensor[source]
f_smear

settings for recruitment

forward(x: Tensor) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

gt_fir_sin

settings for spl calibration

measure_rms(wav: Tensor) Tensor[source]

Compute RMS level of a signal.

Measures total power of all 10 msec frames that are above a specified

threshold of db_relative_rms

Parameters:

wav – input signal

Returns:

RMS level in dB

recruitment(x: Tensor) Tensor[source]
recruitment_fir(x: Tensor) Tensor[source]
recruitment_out_coef

settings for FIR Gammatone Filters

smear(x: Tensor) Tensor[source]

Padding issue needs to be worked out

src_to_cochlea_filt(x: Tensor, cochlea_filter: Tensor) Tensor[source]
class clarity.predictor.torch_msbg.torchloudnorm(sample_rate: int = 44100, norm_lufs: int = -36, kernel_size: int = 1025, block_size: float = 0.4, overlap: float = 0.75, gamma_a: int = -70, device: str | None = None)[source]

Bases: Module

apply_filter(x: Tensor) Tensor[source]
forward(x: Tensor) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

high_pass

rms measurement

integrated_loudness(x: Tensor) Tensor[source]
normalize_loudness(x: Tensor, lufs: Tensor) Tensor[source]

clarity.predictor.torch_stoi module

This implementation is from https://github.com/mpariente/pytorch_stoi, please cite and star the repo. The pip version of torch_stoi does not include EPS in line 127 & 128, hence could lead to sqrt(0)

class clarity.predictor.torch_stoi.NegSTOILoss(*args: Any, **kwargs: Any)[source]

Bases: Module

Negated Short Term Objective Intelligibility (STOI) metric, to be used

as a loss function. Inspired from [1, 2, 3] but not exactly the same : cannot be used as the STOI metric directly (use pystoi instead). See Notes.

Parameters:
  • sample_rate (int) – sample rate of audio input

  • use_vad (bool) – Whether to use simple VAD (see Notes)

  • extended (bool) – Whether to compute extended version [3].

  • do_resample (bool) – Whether to resample audio input to FS

Shapes:

(time,) –> (1, ) (batch, time) –> (batch, ) (batch, n_src, time) –> (batch, n_src)

Returns:

torch.Tensor of shape (batch, *, ), only the time dimension has been reduced.

Warning

This function cannot be used to compute the “real” STOI metric as we applied some changes to speed-up loss computation. See Notes section.

Notes

In the NumPy version, some kind of simple VAD was used to remove the silent frames before chunking the signal into short-term envelope vectors. We don’t do the same here because removing frames in a batch is cumbersome and inefficient. If use_vad is set to True, instead we detect the silent frames and keep a mask tensor. At the end, the normalized correlation of short-term envelope vectors is masked using this mask (unfolded) and the mean is computed taking the mask values into account.

References

[1] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen ‘A Short-Time

Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech’, ICASSP 2010, Texas, Dallas.

[2] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen ‘An Algorithm for

Intelligibility Prediction of Time-Frequency Weighted Noisy Speech’, IEEE Transactions on Audio, Speech, and Language Processing, 2011.

[3] Jesper Jensen and Cees H. Taal, ‘An Algorithm for Predicting the

Intelligibility of Speech Masked by Modulated Noise Maskers’, IEEE Transactions on Audio, Speech and Language Processing, 2016.

static detect_silent_frames(x, dyn_range, framelen, hop)[source]

Detects silent frames on input tensor. A frame is excluded if its energy is lower than max(energy) - dyn_range

Parameters:
  • x (torch.Tensor) – batch of original speech wav file (batch, time)

  • dyn_range – Energy range to determine which frame is silent

  • framelen – Window size for energy evaluation

  • hop – Hop size for energy evaluation

Returns:

torch.BoolTensor, framewise mask.

forward(est_targets: torch.Tensor, targets: torch.Tensor) torch.Tensor[source]

Compute negative (E)STOI loss.

Parameters:
  • est_targets (torch.Tensor) – Tensor containing target estimates.

  • targets (torch.Tensor) – Tensor containing clean targets.

Shapes:

(time,) –> (1, ) (batch, time) –> (batch, ) (batch, n_src, time) –> (batch, n_src)

Returns:

torch.Tensor, the batch of negative STOI loss

static rowcol_norm(x, mask=None)[source]

Mean/variance normalize axis 2 and 1 of input vector

static stft(x, win, fft_size, overlap=4)[source]
clarity.predictor.torch_stoi.masked_mean(x, dim=-1, mask=None, keepdim=False)[source]
clarity.predictor.torch_stoi.masked_norm(x, p=2, dim=-1, mask=None, keepdim=False)[source]
clarity.predictor.torch_stoi.meanvar_norm(x, mask=None, dim=-1)[source]

Module contents