clarity.predictor package¶
Submodules¶
clarity.predictor.torch_msbg module¶
An FIR-based torch implementation of approximated MSBG hearing loss model
- class clarity.predictor.torch_msbg.MSBGHearingModel(audiogram: np.ndarray, audiometric: np.ndarray, sr: int = 44100, spl_cali: bool = True, src_position: str = 'ff', kernel_size: int = 1025, device: str | None = None)[source]¶
Bases:
Module
- f_smear¶
settings for recruitment
- forward(x: Tensor) Tensor [source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- gt_fir_sin¶
settings for spl calibration
- measure_rms(wav: Tensor) Tensor [source]¶
Compute RMS level of a signal.
- Measures total power of all 10 msec frames that are above a specified
threshold of db_relative_rms
- Parameters:
wav – input signal
- Returns:
RMS level in dB
- recruitment_out_coef¶
settings for FIR Gammatone Filters
- class clarity.predictor.torch_msbg.torchloudnorm(sample_rate: int = 44100, norm_lufs: int = -36, kernel_size: int = 1025, block_size: float = 0.4, overlap: float = 0.75, gamma_a: int = -70, device: str | None = None)[source]¶
Bases:
Module
- forward(x: Tensor) Tensor [source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- high_pass¶
rms measurement
clarity.predictor.torch_stoi module¶
This implementation is from https://github.com/mpariente/pytorch_stoi, please cite and star the repo. The pip version of torch_stoi does not include EPS in line 127 & 128, hence could lead to sqrt(0)
- class clarity.predictor.torch_stoi.NegSTOILoss(*args: Any, **kwargs: Any)[source]¶
Bases:
Module
- Negated Short Term Objective Intelligibility (STOI) metric, to be used
as a loss function. Inspired from [1, 2, 3] but not exactly the same : cannot be used as the STOI metric directly (use pystoi instead). See Notes.
- Parameters:
sample_rate (int) – sample rate of audio input
use_vad (bool) – Whether to use simple VAD (see Notes)
extended (bool) – Whether to compute extended version [3].
do_resample (bool) – Whether to resample audio input to FS
- Shapes:
(time,) –> (1, ) (batch, time) –> (batch, ) (batch, n_src, time) –> (batch, n_src)
- Returns:
torch.Tensor of shape (batch, *, ), only the time dimension has been reduced.
Warning
This function cannot be used to compute the “real” STOI metric as we applied some changes to speed-up loss computation. See Notes section.
Notes
In the NumPy version, some kind of simple VAD was used to remove the silent frames before chunking the signal into short-term envelope vectors. We don’t do the same here because removing frames in a batch is cumbersome and inefficient. If use_vad is set to True, instead we detect the silent frames and keep a mask tensor. At the end, the normalized correlation of short-term envelope vectors is masked using this mask (unfolded) and the mean is computed taking the mask values into account.
References
- [1] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen ‘A Short-Time
Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech’, ICASSP 2010, Texas, Dallas.
- [2] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen ‘An Algorithm for
Intelligibility Prediction of Time-Frequency Weighted Noisy Speech’, IEEE Transactions on Audio, Speech, and Language Processing, 2011.
- [3] Jesper Jensen and Cees H. Taal, ‘An Algorithm for Predicting the
Intelligibility of Speech Masked by Modulated Noise Maskers’, IEEE Transactions on Audio, Speech and Language Processing, 2016.
- static detect_silent_frames(x, dyn_range, framelen, hop)[source]¶
Detects silent frames on input tensor. A frame is excluded if its energy is lower than max(energy) - dyn_range
- Parameters:
x (torch.Tensor) – batch of original speech wav file (batch, time)
dyn_range – Energy range to determine which frame is silent
framelen – Window size for energy evaluation
hop – Hop size for energy evaluation
- Returns:
torch.BoolTensor, framewise mask.
- forward(est_targets: torch.Tensor, targets: torch.Tensor) torch.Tensor [source]¶
Compute negative (E)STOI loss.
- Parameters:
est_targets (torch.Tensor) – Tensor containing target estimates.
targets (torch.Tensor) – Tensor containing clean targets.
- Shapes:
(time,) –> (1, ) (batch, time) –> (batch, ) (batch, n_src, time) –> (batch, n_src)
- Returns:
torch.Tensor, the batch of negative STOI loss