recipes.cad2.task1.ConvTasNet.local package
Submodules
recipes.cad2.task1.ConvTasNet.local.musdb18_dataset module
- class recipes.cad2.task1.ConvTasNet.local.musdb18_dataset.Compose(transforms)[source]
Bases:
object
Composes several augmentation transforms. :param augmentations: list of augmentations to compose.
- class recipes.cad2.task1.ConvTasNet.local.musdb18_dataset.MUSDB18Dataset(root: str, sources=None, targets=None, mix_background=False, suffix='.wav', split='train', subset=None, exclude_tracks=None, segment=None, samples_per_track=1, random_segments=False, random_track_mix=False, source_augmentations=<function MUSDB18Dataset.<lambda>>, sample_rate=44100)[source]
Bases:
Dataset
MUSDB18 music separation dataset
The dataset consists of 150 full lengths music tracks (~10h duration) of different genres along with their isolated stems:
drums, bass, vocals and others.
Out-of-the-box, asteroid does only support MUSDB18-HQ which comes as uncompressed WAV files. To use the MUSDB18, please convert it to WAV first:
MUSDB18 HQ: https://zenodo.org/record/3338373
Note
The datasets are hosted on Zenodo and require that users request access, since the tracks can only be used for academic purposes. We manually check this requests.
This dataset asssumes music tracks in (sub)folders where each folder has a fixed number of sources (defaults to 4). For each track, a list of sources and a common suffix can be specified. A linear mix is performed on the fly by summing up the sources
Due to the fact that all tracks comprise the exact same set of sources, random track mixing can be used can be used, where sources from different tracks are mixed together.
- Folder Structure:
>>> #train/1/vocals.wav ---------| >>> #train/1/drums.wav ----------+--> input (mix), output[target] >>> #train/1/bass.wav -----------| >>> #train/1/other.wav ---------/
- Parameters:
root (str) – Root path of dataset
sources (
list
ofstr
, optional) – List of source names that composes the mixture. Defaults to MUSDB18 4 stem scenario: vocals, drums, bass, other.targets (list or None, optional) –
List of source names to be used as targets. If None, a dict with the 4 stems is returned.
If e.g [vocals, drums], a tensor with stacked vocals and drums is returned instead of a dict. Defaults to None.
suffix (str, optional) – Filename suffix, defaults to .wav.
split (str, optional) – Dataset subfolder, defaults to train.
subset (
list
ofstr
, optional) – Selects a specific of list of tracks to be loaded, defaults to None (loads all tracks).segment (float, optional) – Duration of segments in seconds, defaults to
None
which loads the full-length audio tracks.samples_per_track (int, optional) – Number of samples yielded from each track, can be used to increase dataset size, defaults to 1.
random_segments (boolean, optional) – Enables random offset for track segments.
boolean (random_track_mix) – enables mixing of random sources from different tracks to assemble mix.
source_augmentations (
list
ofcallable
) – list of augmentation function names, defaults to no-op augmentations (input = output)sample_rate (int, optional) – Samplerate of files in dataset.
- root
Root path of dataset
- Type:
str
- sources
List of source names. Defaults to MUSDB18 4 stem scenario: vocals, drums, bass, other.
- Type:
list
ofstr
, optional
- suffix
Filename suffix, defaults to .wav.
- Type:
str, optional
- split
Dataset subfolder, defaults to train.
- Type:
str, optional
- subset
Selects a specific of list of tracks to be loaded, defaults to None (loads all tracks).
- Type:
list
ofstr
, optional
- segment
Duration of segments in seconds, defaults to
None
which loads the full-length audio tracks.- Type:
float, optional
- samples_per_track
Number of samples yielded from each track, can be used to increase dataset size, defaults to 1.
- Type:
int, optional
- random_segments
Enables random offset for track segments.
- Type:
boolean, optional
- random_track_mix boolean
enables mixing of random sources from different tracks to assemble mix.
- source_augmentations
list of augmentation function names, defaults to no-op augmentations (input = output)
- Type:
list
ofcallable
- sample_rate
Samplerate of files in dataset.
- Type:
int, optional
- tracks
List of track metadata
- Type:
list
ofDict
- References
“The 2018 Signal Separation Evaluation Campaign” Stoter et al. 2018.
Notes
This is a modified versions of the MUSDB18 dataset from the version in Asteroid. It extend the Asteroid version to allow for targets
`vocals`
and`background`
. The background is the sum of all sources except vocals. This is useful for training models that separate vocals from background.- dataset_name = 'MUSDB18'
- get_infos()[source]
Get dataset infos (for publishing models).
- Returns:
dict, dataset infos with keys dataset, task and licences.
- root: Path
recipes.cad2.task1.ConvTasNet.local.system module
- class recipes.cad2.task1.ConvTasNet.local.system.System(model, optimizer, loss_func, train_loader, val_loader=None, scheduler=None, config=None)[source]
Bases:
LightningModule
Base class for deep learning systems. Contains a model, an optimizer, a loss function, training and validation dataloaders and learning rate scheduler.
Note that by default, any PyTorch-Lightning hooks are not passed to the model. If you want to use Lightning hooks, add the hooks to a subclass:
class MySystem(System): def on_train_batch_start(self, batch, batch_idx, dataloader_idx): return self.model.on_train_batch_start(batch, batch_idx, dataloader_idx)
- Parameters:
model (torch.nn.Module) – Instance of model.
optimizer (torch.optim.Optimizer) – Instance or list of optimizers.
loss_func (callable) – Loss function with signature (est_targets, targets).
train_loader (torch.utils.data.DataLoader) – Training dataloader.
val_loader (torch.utils.data.DataLoader) – Validation dataloader.
scheduler (torch.optim.lr_scheduler._LRScheduler) – Instance, or list of learning rate schedulers. Also supports dict or list of dict as
{"interval": "step", "scheduler": sched}
whereinterval=="step"
for step-wise schedulers andinterval=="epoch"
for classical ones.config – Anything to be saved with the checkpoints during training. The config dictionary to re-instantiate the run for example.
Note
By default,
training_step
(used bypytorch-lightning
in the training loop) andvalidation_step
(used for the validation loop) sharecommon_step
. If you want different behavior for the training loop and the validation loop, overwrite bothtraining_step
andvalidation_step
instead.For more info on its methods, properties and hooks, have a look at lightning’s docs: https://pytorch-lightning.readthedocs.io/en/stable/lightning_module.html#lightningmodule-api
- allow_zero_length_dataloader_with_multiple_devices: bool
- common_step(batch, batch_nb, train=True)[source]
Common forward step between training and validation.
The function of this method is to unpack the data given by the loader, forward the batch through the model and compute the loss. Pytorch-lightning handles all the rest.
- Parameters:
batch – the object returned by the loader (a list of torch.Tensor in most cases) but can be something else.
batch_nb (int) – The number of the batch in the epoch.
train (bool) – Whether in training mode. Needed only if the training and validation steps are fundamentally different, otherwise, pytorch-lightning handles the usual differences.
- Returns:
The loss value on this batch.
- Return type:
torch.Tensor
Note
This is typically the method to overwrite when subclassing
System
. If the training and validation steps are somehow different (except forloss.backward()
andoptimzer.step()
), the argumenttrain
can be used to switch behavior. Otherwise,training_step
andvalidation_step
can be overwriten.
- static config_to_hparams(dic)[source]
Sanitizes the config dict to be handled correctly by torch SummaryWriter. It flatten the config dict, converts
None
to"None"
and any list and tuple into torch.Tensors.- Parameters:
dic (dict) – Dictionary to be transformed.
- Returns:
Transformed dictionary.
- Return type:
dict
- default_monitor: str = 'val_loss'
- lr_scheduler_step(scheduler, metric)[source]
Override this method to adjust the default way the
Trainer
calls each scheduler. By default, Lightning callsstep()
and as shown in the example for each scheduler based on itsinterval
.- Parameters:
scheduler – Learning rate scheduler.
metric – Value of the monitor used for schedulers like
ReduceLROnPlateau
.
Examples:
# DEFAULT def lr_scheduler_step(self, scheduler, metric): if metric is None: scheduler.step() else: scheduler.step(metric) # Alternative way to update schedulers if it requires an epoch value def lr_scheduler_step(self, scheduler, metric): scheduler.step(epoch=self.current_epoch)
- on_save_checkpoint(checkpoint)[source]
Overwrite if you want to save more things in the checkpoint.
- prepare_data_per_node: bool
- training: bool
- training_step(batch, batch_nb)[source]
Pass data through the model and compute the loss.
Backprop is not performed (meaning PL will do it for you).
- Parameters:
batch – the object returned by the loader (a list of torch.Tensor in most cases) but can be something else.
batch_nb (int) – The number of the batch in the epoch.
- Returns:
torch.Tensor, the value of the loss.
- recipes.cad2.task1.ConvTasNet.local.system.flatten_dict(d, parent_key='', sep='_')[source]
Flattens a dictionary into a single-level dictionary while preserving parent keys. Taken from SO
- Parameters:
d (MutableMapping) – Dictionary to be flattened.
parent_key (str) – String to use as a prefix to all subsequent keys.
sep (str) – String to use as a separator between two key levels.
- Returns:
Single-level dictionary, flattened.
- Return type:
dict
recipes.cad2.task1.ConvTasNet.local.tasnet module
- class recipes.cad2.task1.ConvTasNet.local.tasnet.ChannelwiseLayerNorm(channel_size)[source]
Bases:
Module
Channel-wise Layer Normalization (cLN)
- class recipes.cad2.task1.ConvTasNet.local.tasnet.Chomp1d(chomp_size)[source]
Bases:
Module
To ensure the output length is the same as the input.
- class recipes.cad2.task1.ConvTasNet.local.tasnet.ConvTasNetStereo(*args: Any, **kwargs: Any)[source]
Bases:
Module
,PyTorchModelHubMixin
- forward(mixture)[source]
- Parameters:
mixture – [M, T], M is batch size, T is #samples
- Returns:
[M, C, T]
- Return type:
est_source
- class recipes.cad2.task1.ConvTasNet.local.tasnet.Decoder(N, L, audio_channels)[source]
Bases:
Module
- class recipes.cad2.task1.ConvTasNet.local.tasnet.DepthwiseSeparableConv(in_channels, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]
Bases:
Module
- class recipes.cad2.task1.ConvTasNet.local.tasnet.Encoder(L, N, audio_channels)[source]
Bases:
Module
Estimation of the nonnegative mixture weight by a 1-D conv layer.
- class recipes.cad2.task1.ConvTasNet.local.tasnet.GlobalLayerNorm(channel_size)[source]
Bases:
Module
Global Layer Normalization (gLN)
- class recipes.cad2.task1.ConvTasNet.local.tasnet.TemporalBlock(in_channels, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)[source]
Bases:
Module
- class recipes.cad2.task1.ConvTasNet.local.tasnet.TemporalConvNet(N, B, H, P, X, R, C, norm_type='gLN', causal=False, mask_nonlinear='relu')[source]
Bases:
Module
Module contents
- class recipes.cad2.task1.ConvTasNet.local.Compose(transforms)[source]
Bases:
object
Composes several augmentation transforms. :param augmentations: list of augmentations to compose.
- class recipes.cad2.task1.ConvTasNet.local.ConvTasNetStereo(*args: Any, **kwargs: Any)[source]
Bases:
Module
,PyTorchModelHubMixin
- forward(mixture)[source]
- Parameters:
mixture – [M, T], M is batch size, T is #samples
- Returns:
[M, C, T]
- Return type:
est_source
- class recipes.cad2.task1.ConvTasNet.local.MUSDB18Dataset(root: str, sources=None, targets=None, mix_background=False, suffix='.wav', split='train', subset=None, exclude_tracks=None, segment=None, samples_per_track=1, random_segments=False, random_track_mix=False, source_augmentations=<function MUSDB18Dataset.<lambda>>, sample_rate=44100)[source]
Bases:
Dataset
MUSDB18 music separation dataset
The dataset consists of 150 full lengths music tracks (~10h duration) of different genres along with their isolated stems:
drums, bass, vocals and others.
Out-of-the-box, asteroid does only support MUSDB18-HQ which comes as uncompressed WAV files. To use the MUSDB18, please convert it to WAV first:
MUSDB18 HQ: https://zenodo.org/record/3338373
Note
The datasets are hosted on Zenodo and require that users request access, since the tracks can only be used for academic purposes. We manually check this requests.
This dataset asssumes music tracks in (sub)folders where each folder has a fixed number of sources (defaults to 4). For each track, a list of sources and a common suffix can be specified. A linear mix is performed on the fly by summing up the sources
Due to the fact that all tracks comprise the exact same set of sources, random track mixing can be used can be used, where sources from different tracks are mixed together.
- Folder Structure:
>>> #train/1/vocals.wav ---------| >>> #train/1/drums.wav ----------+--> input (mix), output[target] >>> #train/1/bass.wav -----------| >>> #train/1/other.wav ---------/
- Parameters:
root (str) – Root path of dataset
sources (
list
ofstr
, optional) – List of source names that composes the mixture. Defaults to MUSDB18 4 stem scenario: vocals, drums, bass, other.targets (list or None, optional) –
List of source names to be used as targets. If None, a dict with the 4 stems is returned.
If e.g [vocals, drums], a tensor with stacked vocals and drums is returned instead of a dict. Defaults to None.
suffix (str, optional) – Filename suffix, defaults to .wav.
split (str, optional) – Dataset subfolder, defaults to train.
subset (
list
ofstr
, optional) – Selects a specific of list of tracks to be loaded, defaults to None (loads all tracks).segment (float, optional) – Duration of segments in seconds, defaults to
None
which loads the full-length audio tracks.samples_per_track (int, optional) – Number of samples yielded from each track, can be used to increase dataset size, defaults to 1.
random_segments (boolean, optional) – Enables random offset for track segments.
boolean (random_track_mix) – enables mixing of random sources from different tracks to assemble mix.
source_augmentations (
list
ofcallable
) – list of augmentation function names, defaults to no-op augmentations (input = output)sample_rate (int, optional) – Samplerate of files in dataset.
- root
Root path of dataset
- Type:
str
- sources
List of source names. Defaults to MUSDB18 4 stem scenario: vocals, drums, bass, other.
- Type:
list
ofstr
, optional
- suffix
Filename suffix, defaults to .wav.
- Type:
str, optional
- split
Dataset subfolder, defaults to train.
- Type:
str, optional
- subset
Selects a specific of list of tracks to be loaded, defaults to None (loads all tracks).
- Type:
list
ofstr
, optional
- segment
Duration of segments in seconds, defaults to
None
which loads the full-length audio tracks.- Type:
float, optional
- samples_per_track
Number of samples yielded from each track, can be used to increase dataset size, defaults to 1.
- Type:
int, optional
- random_segments
Enables random offset for track segments.
- Type:
boolean, optional
- random_track_mix boolean
enables mixing of random sources from different tracks to assemble mix.
- source_augmentations
list of augmentation function names, defaults to no-op augmentations (input = output)
- Type:
list
ofcallable
- sample_rate
Samplerate of files in dataset.
- Type:
int, optional
- tracks
List of track metadata
- Type:
list
ofDict
- References
“The 2018 Signal Separation Evaluation Campaign” Stoter et al. 2018.
Notes
This is a modified versions of the MUSDB18 dataset from the version in Asteroid. It extend the Asteroid version to allow for targets
`vocals`
and`background`
. The background is the sum of all sources except vocals. This is useful for training models that separate vocals from background.- dataset_name = 'MUSDB18'
- get_infos()[source]
Get dataset infos (for publishing models).
- Returns:
dict, dataset infos with keys dataset, task and licences.
- root: Path
- recipes.cad2.task1.ConvTasNet.local.augment_channelswap(audio)[source]
Randomly swap channels of stereo sources