Submodules¶ module¶
Tools to support higher order ambisonic processing.
- class, resolution)[source]¶
Provides methods for rotating ambisonics.
- rotate(signal: numpy.ndarray, rotation_vector: numpy.ndarray) numpy.ndarray [source]¶
Apply rotation to HOA signals using precomputed rotation matrices.
- Parameters:
signal (array-like) – ambisonic signals
rotation_vector (array-like) – rotation vector (in radians)
- Returns:
transformed ambisonic signals
- Return type:
- int, a: int, b: int, order: int, rotation_matrices: numba.typed.List) numba.typed.List ¶
P function for rotation matrix calculation.
- Parameters:
i (int) – index
a (int) – ‘a’ value
b (int) – ‘b’ value
order (int) – order
r (list(matrix)) – rotation matrices
- Returns:
P value
- Return type:
- int, n: int, order: int, rotation_matrices: numba.typed.List) numba.typed.List ¶
U coefficient initialiser for rotation matrix calculation.
- Parameters:
rotation_degree (int) – Upper parameters of spherical harmonic component Y and n the lower.
n (int) – index
order (int) – order
rotation_matrices (list(matrix)) – rotation matrices
- Returns:
U value
- Return type:
- int, n: int, order: int, rotation_matrices: numba.typed.List) numba.typed.List ¶
V coefficient initialiser for rotation matrix calculation.
- Parameters:
degree (int) – valid inputs are int(|m|) <= order.
n (int) – index
order (int) – order
rotation_matrices (list(matrix)) – rotation matrices
- Returns:
V value
- Return type:
- int, n: int, order: int, rotation_matrices: numba.typed.List) numba.typed.List ¶
W coefficient initialiser for rotation matrix calculation.
- Parameters:
degree (int) – degree
n (int) – index
order (int) – order
rotation_matrices (list(matrix)) – rotation matrices
- Returns:
W value
- Return type:
- numpy.ndarray, hoa_impulse_responses: numpy.ndarray, order: int) numpy.ndarray [source]¶
Convolve HOA Impulse Responses with signals.
- Parameters:
signal (ndarray[samples]) – the signal to convole
hoa_impulse_response (ndarray[samples, channels]) – the HOA impulse responses
order (int, optional) – ambisonic order.
- Returns:
the convolved signal
- Return type:
np.ndarray[samples, channels]
- numpy.ndarray, hrir: dict[str, Any], hrir_metadata: dict[str, Any]) numpy.ndarray [source]¶
Perform binaural mixdown of ambisonic signals.
- Parameters:
ambisonic_signals (array-like) – inputs
hrir_filename (string) – name of HRIR file
hrir_metadata (dict) – data for channel selection and ambisonic decoding
- Returns:
stereo audio
- Return type:
- numpy.ndarray, row: int, col: int)¶
Get value from centered element indexing.
- Parameters:
reference (matrix) – reference input matrix
row (int) – row index
col (int) – column index
- Returns:
matrix element
- Return type:
-, n, order)¶
Compute U, V and W coefficients for rotation matrix calculation.
- Parameters:
m (index) – degree
n (index) – index
el (index) – order
- Returns:
u, v, w
- Return type:
- int, rotation_matrices: numba.typed.List, output)¶
Compute submatrix for rotation matrix.
- Parameters:
order (int) – order of submatrix
rotationmatrices (list(matrix)) – previous and current submatrices
output (matrix) – output destination
- Returns:
rotation submatrix
- Return type:
- numpy.ndarray, axis: int = 0) numpy.ndarray [source]¶
Compute rms values along a given axis. :param input_signal: Input signal :type input_signal: np.ndarray :param axis: Axis along which to compute the Root Mean Square. 0 (default) or 1. :type axis: int
- Returns:
Root Mean Square for the given axis.
- Return type:
- int, foa_rotmat: numpy.ndarray) numpy.ndarray ¶
Generate a rotation matrix to rotate HOA soundfield.
Based on [1]_ and [2]. Operates on HOA of a given order rotates by azimuth theta and elevation phi.
- Parameters:
order (int) – order of ambisonic soundfield
foa_rotmat (arraylike) – rotation matrix to expand
- Returns:
HOA rotation matrix
- Return type:
References: .. [1] Ivanic J, Ruedenberg K (1996) Rotation Matrices for Real Spherical Harmonics.
Direct Determination J. Phys. Chem. 1996, 100(15):6342–6347. Available at <> and <>
- float, end_angle: float, signal_length: int, start_idx: int, end_idx: int) numpy.ndarray [source]¶
Compute the rotation vector.
- Parameters:
start_angle (float)
end_angle (float)
signal_length (int)
start_idx (int)
end_idx (int)
- Returns:
- Return type:
- numpy.ndarray, front_right_down: numpy.ndarray, back_left_down: numpy.ndarray, back_right_up: numpy.ndarray)[source]¶
Converts 1st order A format audio into 1st order B format. For more information on ambisonic formats see Gerzon, Michael A. “Ambisonics. Part two: Studio techniques.” (1975).
- Parameters:
front_left_up (np.ndarray) – Front-left-up audio
front_right_down (np.ndarray) – Front-right-down audio
back_left_down (np.ndarray) – Back-left-down audio
back_right_up (np.ndarray) – Back-right-up audio
- Raises:
TypeError – input must be numpy array
ValueError – all inputs must have same dimensions
- Returns:
4xN array containing B-format audio. indexed w,x,y,z
- Return type:
- int | float) float [source]¶
Convert dB to gain.
- Parameters:
x (float)
- Returns
-, B)¶
Wraps for numba #@njit.
- Parameters:
A (Array)
B (Array)
- Returns:
- Return type:
- list[numpy.ndarray]) list[numpy.ndarray] [source]¶
Equalise RMS levels.
- Parameters:
inputs (array) – signals
- Returns:
normalised signals
- Return type:
- int, start_index: int, end_index: int, smoothness: int = 1) numpy.ndarray [source]¶
Generate mapped rotation control vector for values of theta.
- Parameters:
array_length (int) – Length of array
start_index (int) – Start position
end_index (int)
smoothness (int, optional)
- Returns:
mapped rotation control vector
- Return type:
- numpy.ndarray, x_min: float = 0.0, x_max: float = 1.0, N: int = 1) numpy.ndarray [source]¶
Apply the smoothstep function.
- Parameters:
x (np.ndarray) – input
x_min (float, optional) – clamp minimum. Defaults to 0.
x_max (float, optional) – clamp maximum. Defaults to 1.
N (int, optional) – smoothing factor. Defaults to 1.
- Returns:
smoothstep values
- Return type:
np.ndarray module¶
Functions for downloading demo data.
- str, target_dir: str) None [source]¶
Download demo data.
- Parameters:
metadata_url (str) – URL to download data from (should be a link on Google Drive)
target_dir (str) – Directory to save to (default ‘clarity_data/demo’), will be created if it doesn’t exist.
- Returns:
- str = 'clarity_data/demo') None [source]¶
Download hiris.
- str = 'clarity_data/demo') None [source]¶
Download interferers.
- str = 'clarity_data/demo') None [source]¶
Download metadata.
- str = 'clarity_data/demo') None [source]¶
Download rooms. module¶
Code for building the scenes.json files.
- class[source]¶
Enum for interferer types.
- MUSIC = 'music'¶
- NOISE = 'noise'¶
- SPEECH = 'speech'¶
- class[source]¶
Functions for handling rooms.
- build_from_rpf(rpf_location, n_interferers=3, n_rooms=10000, start_room=1)[source]¶
Build a list of rooms by extracting info from RAVEN rpf files.
- Parameters:
rpf_location (str) – path to where rpf files are stored
n_interferers (int, optional) – number of interferer definitions to expect. Defaults to N_INTERFERERS.
n_rooms (int, optional) – number of scenes to expect. Defaults to N_SCENES.
start_room (int, optional) – index of the first room to expect
- get_room(name: str)[source]¶
Get a room by name.
- Parameters:
name (str) – Name of room to extract.
- class, /)[source]¶
Round a float to 4 decimal places.
- class, scene_datasets, target, interferer, snr_range, listener, shuffle_rooms=None)[source]¶
Class with methods for building a list of scenes.
- add_SNR_to_scene(snr_range: list)[source]¶
Add the Signal Noise Ratio (SNR) info to the scenes.
- Parameters:
snr_range (list) – Range of values from which SNR will be sampled.
- add_interferer_to_scene(speech_interferers: str, noise_interferers: str, music_interferers: str, number: list, start_time_range: list, end_early_time_range: list)[source]¶
Add interferer to the scene description file.
- Parameters:
speech_interferers (str) – Path to speech interferer to load.
noise_interferers (str) – Path to noise interferer to load.
music_interferers (str) – Path to music interferer to load
number (list) – Number of interefers to be added.
start_time_range (list) – Range for randomly selecting start point.
end_early_time_range (list) – Range for randomly selecting end point.
- Returns:
- add_listener_details_to_scene(heads, channels, relative_start_time_range: list, duration_mean: float, duration_sd: float, angle_initial_mean: float, angle_initial_sd: float, angle_final_range: tuple)[source]¶
Add the listener info to the scenes.
- Parameters:
() (channels)
relative_start_time_range (list) – Range from which start time is selected at random.
duration_mean (float) – mean of the time offset for start of turn
duration_sd (float) – standard deviation of the time offset for start of turn
angle_initial_mean (float)
angle_initial_sd (float)
angle_final_range (tuple)
- Returns:
- add_target_to_scene(dataset: str, target_speakers: str, target_selection: str, pre_samples_range: list, post_samples_range: list)[source]¶
Add target info to the scenes.
Uses target speaker file set via config.
- Parameters:
dataset (str) – dataset to be added.
target_speakers (str)
target_selection (str) – Type of target to be added, valid values are ‘SEQUENTIAL’ and ‘RANDOM’.
pre_samples_range (list) – Parameters for number of samples prior to target onset.
post_samples_range (list) – Parameters for number of samples to continue player after target offsets.
Raises: TypeError if room_selection is not SEQUENTIAL or RANDOM
- initialise_scenes(dataset, n_scenes: int, room_selection: str, scene_start_index: int)[source]¶
Initialise the scenes for a given dataset.
- Parameters:
dataset – train, dev, or eval set
n_scenes (int) – number of scenes to generate
room_selection (str) – SEQUENTIAL or RANDOM
scene_start_index (int) – index to start for scene IDs
Raises: TypeError if room_selection is not SEQUENTIAL or RANDOM
- dict, interferers: dict, number: list, start_time_range: list[int], end_early_time_range: list[int])[source]¶
Randomly select interferers and add them to the given scene. A random number of interferers is chosen, then each is given a random type selected from the possible speech, nonspeech, music types. Interferers are then chosen from the available lists according to the type and also taking care to match the scenes ‘dataset’ field, ie. train, dev, test. The interferer data is supplied as a dictionary of lists of lists. The key being “speech”, “nonspeech”, or “music”, and the list of list being a partitioned list of interferers for that type. The idea of using a list of lists is that interferers can be split by subcondition and then the randomization draws equally from each subcondition, e.g. for nonspeech there is “washing machine”, “microwave” etc. This ensures that each subcondition is equally represented even if the number of exemplars of each subcondition is different. Note, there is no return. The scene is modified in place.
- Parameters:
scene (dict) – the scene description
interferers (dict) – the interferer metadata
number (list) – number of interferers
start_time_range (list) – range of starting points as integers, a random number is selected between these.
end_early_time_range (list) – range of end points as integers, a random number is selected between these.
- dict, scene: dict, pre_samples_range: list, post_samples_range: list)[source]¶
Add the target details to the scene dict.
Adds given target to given scene. Target details will be taken from the target dict but the start time will be according to the CEC2 target start time specification.
- Parameters:
target (dict) – target dict read from target metadata file.
scene (dict) – complete scene dictionary.
pre_samples_range (list) – parameters for number of samples prior to target onset
post_samples_range (list) – parameters for number of samples to continue player after target offsets.
- str, interferer_files: list[str]) dict [source]¶
Build room json file from contents of related rpf files. Note, there is an rpf file for each source in the scene. All of these files are read and a single scene json file is constructed.
- Parameters:
target_file (str) – rpf file containing the target position
interferer_files (List[str]) – list of files containing the interferer positions
- Returns:
dictionary representation of the scene following CEC2 scene.json format
- Return type:
- dict, relative_start_time_range: list, duration_mean: float, duration_sd: float, angle_initial_mean: float, angle_initial_sd: float, angle_final_range: tuple) list[dict] [source]¶
Generate a suitable head rotation for the given scene. Based on behavioural studies by Hadley et al. TODO: find ref
- Parameters:
scene (dict) – the scene description
relative_start_time_range (list) – Range from which start time is uniformly selected at random.
duration_mean (float) – mean of the time offset for start of turn
duration_sd (float) – standard deviation of the time offset for start of turn
angle_initial_mean (float)
angle_initial_sd (float)
angle_final_range (tuple)
- Returns:
list of dicts with keys “sample” and “view_vector” specifying the head motion.
- Return type:
- list[int]) float [source]¶
Generate a random Signal Noise Ratio (SNR).
- Parameters:
snr_range (list) – Range from which to uniformly sample SNR.
- Returns:
random number from uniform distribution in given range.
- Return type:
- list) int [source]¶
Number of samples to continue player after target offsets.
- Parameters:
post_samples_range (list) – parameters for number of samples to continue.
- list) int [source]¶
Number of samples prior to target onset.
- Parameters:
pre_samples_range (list) – parameters for number of samples prior to target onset
-, channels)[source]¶
Get a random HRIR set.
- dict, required_samples: int) int [source]¶
Generate a random offset sample for interferer. The offset sample is the point within the masker signal at which the interferer segment will be extracted. Randomly selected but with care for it not to start too late, i.e. such that the required samples would overrun the end of the masker signal will be used is taken.
- Parameters:
interferer (dict) – the interferer metadata
required_samples (int) – number of samples that is going to be required
- Returns:
a valid randomly selected offset
- Return type:
- str) list [source]¶
Find the room dimensions in the rpf file.
- Parameters:
text (str) – String to be searched for room dimensions (string to be searched for is of the form ‘ProjectName = CuboidRoom_5.9x3.4186x2.9’).
- Returns:
List of the three dimensions of the room.
- Return type:
- str) str [source]¶
Find the room name in the rpf file.
- Parameters:
text (str) – String to be searched for room name (‘R’ followed by 5 digits).
- Returns:
The room name.
- Return type:
- str, vector_name: str) list[float] [source]¶
Get a vector quantity from the rpf file. Will read rpf vector quantities, eg. “sourceViewVectors = -0.095,-0.995, 0.000”
- Parameters:
text (str) – string contents of the rpf file
vector_name (str) – name of vector to extract (e.g. “sourceViewVectors”)
- Returns:
vector as list of floats
- Return type:
- str, scene_str: str, n_interferers: int) dict [source]¶
Construct dictionary storing all rpf files that will be processed.
- Parameters:
rpf_location (str) – Location
scene (str) – Scene (as string)
n_interferers (int) – Number of interferers.
- Returns:
Dictionary of rpf files to be processed.
- Return type:
- str) dict [source]¶
Process an rpf file and return key contents as a dictionary.
- Parameters:
rpf_filename (str) – Path to an rpf file to be read.
- Returns:
- dictionary of rpf file contents
{“position”: sourcePositions, “view_vector”: sourceViewVectors}
- Return type:
- list) list[InterfererType] [source]¶
Select the interferer types to use.
The number of interferer is drawn randomly from list of allowed valued. The type of each is chosen randomly but there is not allowed to be more than 1 music source.
- Parameters:
allowed_n_interferers (list) – list of allowed number of interferers
- Returns:
list of interferer types to use
- Return type:
- list[list], dataset: str, required_samples: int) dict [source]¶
Randomly select an interferer. Interferers stored as list of list. First randomly select a sublist then randomly select an item from sublist matching constraints.
- Parameters:
interferers (list(list)) – interferers as list of lists
dataset (str) – desired data [train, dev, eval]
required_samples (int) – required number of samples
- Raises:
ValueError – if no suitable interferer is found
- Returns:
the interferer dict
- Return type:
dict module¶
Scene rendering for CEC1 challenge.
- class, output_path, num_channels=1, sample_rate=44100, ramp_duration=0.5, tail_duration=0.2, pre_duration=2.0, post_duration=1.0, test_nbits=16)[source]¶
SceneGenerator of CEC1 training and development sets. The render() function generates all simulated signals for each scene given the parameters specified in the metadata/scenes.train.json or metadata/ file.
- apply_brir(signal, brir)[source]¶
Convolve a signal with a binaural room impulse response (BRIR).
- Parameters:
signal (ndarray) – The mono or stereo signal stored as array of floats
brir (ndarray) – The BRIR stored a 2xN array of floats
n_tail (int) – Truncate output to input signal length + n_tail
- Returns:
The convolved signals
- Return type:
- apply_ramp(signal, ramp_duration)[source]¶
Apply half cosine ramp into and out of signal.
- Parameters:
signal (np.ndarray) – signal to be ramped.
ramp_duration (int) – ramp duration in seconds.
- Returns:
Signal ramped into and out of by cosine function.
- Return type:
- compute_snr(target: numpy.ndarray, noise: numpy.ndarray, pre_samples=0, post_samples=-1)[source]¶
Return the Signal Noise Ratio (SNR).
Take the overlapping segment of the noise and get the speech-weighted better ear Signal Noise Ratio. (Note, SNR is a ratio – not in dB.)
- Parameters:
target (np.ndarray) – Target signal.
noise (np.ndarray) – Noise (should be same length as target)
- Returns:
signal_noise_ratio for better ear.
- Return type:
- dict, output_path: str, num_channels: int) bool [source]¶
Checks correct dataset directory for full set of pre-existing files.
- Parameters:
scene (dict) – dictionary defining the scene to be generated.
output_path (str) – Path files should be saved to.
num_channels (int) – Number of channels
- Returns:
- boolean value indicating whether scene signals exist
or do not exist.
- Return type:
status module¶
Clarity ambisonic scene rendering.
- class, metadata, ambisonic_order, equalise_loudness, reference_channel, channel_norms)[source]¶
Ambisonic scene rendering class.
Contains methods for generating signals from pseudorandom datasets for CEC2
- generate_binaural_signals(scene: dict, hoa_target: numpy.ndarray, hoa_interferer: numpy.ndarray, hoa_target_anechoic: numpy.ndarray, out_path: str) None [source]¶
Generate and write binaural signals.
- Parameters:
scene (dict) – scene definitions
hoa_target (ndarray) – target signal in HOA domain
hoa_interferer (ndarray) – interferer signal in HOA domain
hoa_target_anechoic (ndarray) – anechoic target signal in HOA domain
out_path (string) – output path
- generate_hoa_signals(scene: dict) tuple [source]¶
Generates HOA signals.
- Parameters:
scene (dict) – scene definitions
- load_interferer_hoairs(scene)[source]¶
Loads and returns the interferer hoa irs for given scene.
- Parameters:
() (scene)
- Returns:
List of inferior hoa irs for the given scene.
- Return type:
- load_interferer_signals(scene)[source]¶
Loads and returns interferer signals for given scene.
- Parameters:
() (scene)
- Returns:
List of signals.
- Return type:
- make_hoa_target_anechoic(target, room)[source]¶
Make the HOA anechoic target.
Applies an anechoic HOA IR that models a source straight in front of the listener. The signal is delayed to match the propagation delay of the room.
- Parameters:
() (target)
room (dict)
- make_interferer_filename(interferer: dict, dataset) str [source]¶
Construct filename for an interferer.
- Parameters:
interferer (dict)
- Returns:
Filename for an interferer.
- Return type:
- numpy.ndarray, delay: int, duration: int) numpy.ndarray [source]¶
Pad signal at start and end.
- Parameters:
signal (array-like) – ambisonic signals
delay (int) – number of zeros to pad at start
duration (int) – desired duration after start and end padding
- Returns:
padded signals
- Return type:
- dict, origin: numpy.ndarray, duration: int) numpy.ndarray [source]¶
Perform rotation defined by two control points.
- Parameters:
rotation (dict) – rotation object from scene definition
origin (ndarray) – origin view vector
duration (int) – total number of samples to generate for
- Returns:
sequence of theta values per sample
- Return type:
np.ndarray module¶
Utilities for data generation.
- numpy.ndarray, noise: numpy.ndarray) float [source]¶
Calculate effective better ear SNR.
- Parameters:
target (np.ndarray)
noise (np.ndarray)
- Returns:
Maximum Signal Noise Ratio between left and right channel.
- numpy.ndarray, length: int) numpy.ndarray [source]¶
Zero pad signal to required length.
Assumes required length is not less than input length.
- Parameters:
signal (np.array)
length (int)
- Return type:
- numpy.ndarray, noise: numpy.ndarray) float [source]¶
Apply speech weighting filter to signals and get SNR.
- Parameters:
target (np.ndarray)
noise (np.ndarray)
- Return type:
Signal Noise Ratio
- list) np.ndarray | Literal[0] [source]¶
Return sum of a list of signals.
Signals are stored as a list of ndarrays whose size can vary in the first dimension, i.e., so can sum mono or stereo signals etc. Shorter signals are zero padded to the length of the longest.
- Parameters:
signals (list) – List of signals stored as ndarrays
- Returns:
The sum of the signals
- Return type: