clarity.utils package

Subpackages

Submodules

clarity.utils.audiogram module

Dataclass to represent a monaural audiogram

class clarity.utils.audiogram.Audiogram(levels: ~numpy.ndarray, frequencies: ~numpy.ndarray = <factory>)[source]

Bases: object

Dataclass to represent an audiogram.

levels

The audiometric levels in dB HL

Type:

ndarray

frequencies

The frequencies at which the levels are measured

Type:

ndarray

frequencies: ndarray
has_frequencies(frequencies: ndarray) bool[source]

Check if the audiogram has the given frequencies.

Parameters:

frequencies (ndarray) – The frequencies to check

Returns:

True if the audiogram has the given frequencies

Return type:

bool

levels: ndarray
resample(new_frequencies: ndarray, linear_frequency: bool = False) Audiogram[source]

Resample the audiogram to a new set of frequencies.

Interpolates linearly on a (by default) log frequency axis. If linear_frequencies is set True then interpolation is done on a linear frequency axis.

Parameters:

new_frequencies (ndarray) – The new frequencies to resample to

Returns:

New audiogram with resampled frequencies

Return type:

Audiogram

property severity: str

Categorise HL severity level for the audiogram.

Note that this categorisation is different from that of the British Society of Audiology, which recommends descriptors mild, moderate, severe and profound for average hearing threshold levels at 250, 500, 1000, 2000 and 4000 Hz of 21-40 dB HL, 41-70 dB HL, 71-95 dB HL and > 95 dB HL, respectively (BSA Pure-tone air-conduction and bone-conduction threshold audiometry with and without masking 2018).

Returns:

str – severity level, one of SEVERE, MODERATE, MILD, NOTHING

class clarity.utils.audiogram.Listener(audiogram_left: Audiogram, audiogram_right: Audiogram, id: str = '')[source]

Bases: object

Dataclass to represent a Listener.

The listener is currently defined by their left and right ear audiogram. In later versions, this may be extended to include further audiometric data.

The class provides methods for reading metadata files which will also include some basic validation.

id

The ID of the listener

Type:

str

audiogram_left

The audiogram for the left ear

Type:

Audiogram

audiogram_right

The audiogram for the right ear

Type:

Audiogram

audiogram_left: Audiogram
audiogram_right: Audiogram
static from_dict(listener_dict: dict) Listener[source]

Create a Listener from a dict.

The dict structure and fields are based on those used in the Clarity metadata files.

Parameters:

listener_dict (dict) – The listener dict

Returns:

The listener

Return type:

Listener

id: str = ''
static load_listener_dict(filename: Path) dict[str, Listener][source]

Read a Clarity Listener dict file.

The standard Clarity metadata files presents listeners as a dictionary of listeners, keyed by listener ID.

Parameters:

filename (Path) – The path to the listener dict file

Returns:

A dict of listeners keyed by id

Return type:

dict[str, Listener]

clarity.utils.file_io module

File I/O functions.

clarity.utils.file_io.read_jsonl(filename: str) list[source]

Read a jsonl file into a list of dictionaries.

clarity.utils.file_io.read_signal(filename: str | Path, sample_rate: float = 0.0, offset: int | float = 0, n_samples: int = -1, n_channels: int = 0, offset_is_samples: bool = False, allow_resample: bool = True) ndarray[source]

Read a wav format audio file and return as numpy array of floats.

The returned value will be a numpy array of floats of shape (n_samples, n_channels) for n_channels >= 2, and shape (n_samples,) for n_channels = 1.

If n_samples is set to a value other than -1, the specified number of samples will be read, or until the end of the file. An ‘offset’ can be set to start reading from a specified sample or time (in seconds).

The expected number of channels can be specified. If the file has a different number of channels, an error will be raised.

The expected sample rate can be specified. If the file has a different sample the file will be resampled to the expected sample rate, unless ‘allow_resample’ is set to False, in which case an error will be raised.

Parameters:
  • filename (str|Path) – Name of file to read

  • sample_rate (float) – The expected sample rate (default: 0.0 = any rate OK)

  • offset (int, optional) – Offset in samples or seconds (from start). Default is 0.

  • n_samples (int) – Number of samples.

  • n_channels (int) – expected number of channel (default: 0 = any number OK)

  • offset_is_samples (bool) – is offset measured in samples, (True) or seconds (False) (default: False)

  • allow_resample (bool) – allow resampling if sample rate is different from expected rate. Else raise error (default: True)

Returns:

audio signal

Return type:

np.ndarray

clarity.utils.file_io.write_jsonl(filename: str, records: list) None[source]

Write a list of dictionaries to a jsonl file.

clarity.utils.file_io.write_signal(filename: str | Path, signal: ndarray, sample_rate: float, floating_point: bool = True, strict: bool = False) None[source]

Write a signal as fixed or floating point wav file.

Signals are passed as numpy arrays of floats of shape (n_samples, n_channels) for n_channels >= 1 or (n_samples,) for n_channels = 1.

Signals are floating point in the range [-1.0 to 1.0) but can be written as wav file with either 16 bit integers or floating point. In the former, the signals are scaled to map to the range -32768 to 32767 and clipped if necessary.

NB: setting ‘strict’ to True will raise error on overflow. This would be a more natural default but it would break existing code that did not check for overflow.

Parameters:
  • filename (str|Path) – name of file in to write to.

  • signal (ndarray) – signal to write.

  • sample_rate (float) – sampling frequency.

  • floating_point (bool) – write as floating point else an ints (default: True).

  • strict (bool) – raise error if signal out of range for int16 (default: False).

clarity.utils.flac_encoder module

Class for encoding and decoding audio signals

using flac compression.

class clarity.utils.flac_encoder.FileDecoder(input_file: Path, output_file: Path | None = None)[source]

Bases: FileDecoder

process() tuple[ndarray, int][source]

Overwritten version of the process method from the pyflac decoder. Original process returns stereo signals in float64 format.

In this version, the data is returned using the original number of channels and in in16 format.

Returns:

A tuple of the decoded numpy audio array, and the sample rate

of the audio data.

Return type:

(tuple)

Raises:

DecoderProcessException – if any fatal read, write, or memory allocation error occurred (meaning decoding must stop)

class clarity.utils.flac_encoder.FlacEncoder(compression_level: int = 5)[source]

Bases: object

Class for encoding and decoding audio signals using FLAC

It uses the pyflac library to encode and decode the audio data. And offers convenient methods for encoding and decoding audio data.

static decode(input_filename: Path | str) tuple[np.ndarray, float][source]

Method to decode a flac file to wav audio data.

It uses the pyflac library to decode the flac file.

Parameters:

input_filename (pathlib.Path | str) – Path to the input FLAC file.

Returns:

The raw audio data.

Return type:

(np.ndarray)

Raises:

FileNotFoundError – If the flac file to decode does not exist.

encode(signal: np.ndarray, sample_rate: int, output_file: str | Path | None = None) bytes[source]

Method to encode the audio data using FLAC compressor.

It creates a WavEncoder object and uses it to encode the audio data.

Parameters:
  • signal (np.ndarray) – The raw audio data to be compressed.

  • sample_rate (int) – The sample rate of the audio data.

  • output_file (str | Path) – Path to where to save the output FLAC file. If not specified, a temporary file will be created.

Returns:

The FLAC encoded audio signal.

Return type:

(bytes)

Raises:

ValueError – If the audio signal is not in np.int16 format.

class clarity.utils.flac_encoder.WavEncoder(signal: np.ndarray, sample_rate: int, output_file: str | Path | None = None, compression_level: int = 5, blocksize: int = 0, streamable_subset: bool = True, verify: bool = False)[source]

Bases: _Encoder

Class offers an adaptation of the pyflac.encoder.FileEncoder to work directly with WAV signals as input.

process() bytes[source]

Process the audio data from the WAV file.

Returns:

The FLAC encoded bytes.

Return type:

(bytes)

Raises:

EncoderProcessException – if an error occurs when processing the samples

clarity.utils.flac_encoder.read_flac_signal(filename: Path) tuple[ndarray, float][source]

Read a FLAC signal and return it as a numpy array

Parameters:

filename (Path) – The path to the FLAC file to read.

Returns:

The decoded signal. sample_rate (float): The sample rate of the signal.

Return type:

signal (np.ndarray)

clarity.utils.flac_encoder.save_flac_signal(signal: np.ndarray, filename: Path, signal_sample_rate: int, output_sample_rate: int | None = None, do_clip_signal: bool = False, do_soft_clip: bool = False, do_scale_signal: bool = False) None[source]

Function to save output signals.

  • The output signal will be resample to output_sample_rate.

    If output_sample_rate is None, the output signal will have the same sample rate as the input signal.

  • The output signal will be clipped to [-1, 1] if do_clip_signal is True

    and use soft clipped if do_soft_clip is True. Note that if do_clip_signal is False, do_soft_clip will be ignored. Note that if do_clip_signal is True, do_scale_signal will be ignored.

  • The output signal will be scaled to [-1, 1] if do_scale_signal is True.

    If signal is scale, the scale factor will be saved in a TXT file. Note that if do_clip_signal is True, do_scale_signal will be ignored.

  • The output signal will be saved as a FLAC file.

Parameters:
  • signal (np.ndarray) – Signal to save

  • filename (Path) – Path to save signal

  • signal_sample_rate (int) – Sample rate of the input signal

  • output_sample_rate (int) – Sample rate of the output signal

  • do_clip_signal (bool) – Whether to clip signal

  • do_soft_clip (bool) – Whether to apply soft clipping

  • do_scale_signal (bool) – Whether to scale signal

clarity.utils.results_support module

Dataclass to save challenges results to a CSV file.

class clarity.utils.results_support.ResultsFile(file_name: str | Path, header_columns: list[str], append_results: bool = False)[source]

Bases: object

A utility class for writing results to a CSV file.

file_name

The name of the file to write results to.

Type:

str | Path

header_columns

The columns to write to the CSV file.

Type:

list[str]

append_results

Whether to append results to an existing file. If False, a new file will be created and the header row will be written. Defaults to False.

Type:

bool

add_result(row_values: dict[str, str | float])[source]

Add a result to the CSV file.

Parameters:

row_values (dict[str, str | float]) – The values to write to the CSV file.

append_results: bool = False
file_name: str | Path
header_columns: list[str]

clarity.utils.signal_processing module

Signal processing utilities.

clarity.utils.signal_processing.clip_signal(signal: numpy.ndarray, soft_clip: bool = False) tuple[numpy.ndarray, int][source]

Clip the signal.

Parameters:
  • signal (np.ndarray) – Signal to be clipped and saved.

  • soft_clip (bool) – Whether to use soft clipping.

Returns:

Clipped signal. n_clipped (int): Number of samples clipped.

Return type:

signal (np.ndarray)

clarity.utils.signal_processing.compute_rms(signal: numpy.ndarray) float[source]

Compute RMS of signal

Parameters:

signal – Signal to compute RMS of.

Returns:

RMS of the signal.

Return type:

float

clarity.utils.signal_processing.denormalize_signals(sources: numpy.ndarray, ref: numpy.ndarray) numpy.ndarray[source]

Scale signals back to the original scale.

Parameters:
  • sources (ndarray) – Source to be scaled.

  • ref (ndarray) – Original sources to be used for reverting scaling.

Returns:

Signal rescaled back to its original.

Return type:

ndarray

clarity.utils.signal_processing.normalize_signal(signal: numpy.ndarray) tuple[numpy.ndarray, numpy.ndarray][source]

Standardize the signal to have zero mean and unit variance.

Parameters:

signal – The signal to be standardized.

Returns:

The standardized signal and the reference signal.

clarity.utils.signal_processing.resample(signal: numpy.ndarray, sample_rate: float, new_sample_rate: float, method: str = 'soxr') numpy.ndarray[source]

Resample a signal to a new sample rate.

This is a simple wrapper around soxr and scipy.signal.resample with the resampling expressed in terms of input and output sampling rates.

It also ensures that for multichannel signals, resampling is in the time domain, i.e. down the columns.

Parameters:
  • signal – The signal to be resampled.

  • sample_rate – The original sample rate.

  • new_sample_rate – The new sample rate.

  • method – determine the approach use.

Returns:

The resampled signal.

clarity.utils.signal_processing.to_16bit(signal: numpy.ndarray) numpy.ndarray[source]

Convert the signal to 16 bit.

Parameters:

signal (np.ndarray) – Signal to be converted.

Returns:

Converted signal.

Return type:

signal (np.ndarray)

clarity.utils.source_separation_support module

Module that contains functions for source separation.

clarity.utils.source_separation_support.get_device(device: str) tuple[source]

Get the Torch device.

Parameters:

device (str) – device type, e.g. “cpu”, “gpu0”, “gpu1”, etc.

Returns:

torch.device() appropiate to the hardware available. str: device type selected, e.g. “cpu”, “cuda”.

Return type:

torch.device

clarity.utils.source_separation_support.separate_sources(model: torch.nn.Module, mix: torch.Tensor, sample_rate: int, segment: float = 10.0, overlap: float = 0.1, device: torch.device | str | None = None)[source]

Apply model to a given mixture. Use fade, and add segments together in order to add model segment by segment.

Parameters:
  • model (torch.nn.Module) – model to use for separation

  • mix (torch.Tensor) – mixture to separate, shape (batch, channels, time)

  • sample_rate (int) – sampling rate of the mixture

  • segment (float) – segment length in seconds

  • overlap (float) – overlap between segments, between 0 and 1

  • device (torch.device, str, or None) – if provided, device on which to execute the computation, otherwise mix.device is assumed. When device is different from mix.device, only local computations will be on device, while the entire tracks will be stored on mix.device.

Returns:

estimated sources

Return type:

torch.Tensor

Based on https://pytorch.org/audio/main/tutorials/hybrid_demucs_tutorial.html

Module contents