# Usage Before getting started you should ensure you have installed pyClarity within a virtual environment, if you've not already done so please refer to the [installation instructions](installation.md). To check run `pip show pyclarity` and you should get some information about the installed version. ``` bash Version: 0.1.0 Summary: Tools for the Clarity Challenge Home-page: https://github.com/claritychallenge/clarity Author: The PyClarity team Author-email: clarity-group@sheffield.ac.uk License: MIT Location: /home/usera/work/projects/claritychallenge/clarity Requires: audioread, hydra-core, hydra-submitit-launcher, importlib-metadata, librosa, matplotlib, numpy, omegaconf, pandas, pyloudnorm, scikit-learn, scipy, SoundFile, tqdm Required-by: ``` If you don't see similar to the above then check that you have activated the Virtual Environment you made the install under, if the package still isn't found then you should go through the installation process again within your Virtual Environment. ## Jupyter Notebooks These tutorials are available as Jupyter Notebooks [pyClarity Tutorials](https://claritychallenge.org/tutorials) that run in Google CoLab. The examples and code below take you through the using the pyClarity tools using smaller demo datasets which are provided under `clarity.data.demo_data` and have specific functions for loading. ## 01 Installing pyClarity and Using Metadata This demonstration uses only the `metadata` datasets and it is downloaded to the `clarity_data/demo/metadata/` directory. ``` python from clarity.data import demo_data demo_data.get_metadata_demo() ``` This will have created a directory called `clarity_data` containing the metadata files that have been downloaded. ### The structure of the metadata files There are four metadata files - `rooms` - geometry of the rooms used for the simulations - `scenes` - information about the sound scene that is playing in the room - `listeners` - audiometric data for the hearing-impaired listeners who will listen to the scenes - `scenes_listeners` - a mapping assigning specific listeners to specific scenes (in the evaluation, each scene will be listened to by three separate listeners) Information about *individual* rooms, scenes, listeners etc is stored as a [dictionary](https://www.tutorialspoint.com/python/python_dictionary.htm). The complete collections are then stored as either a [list](https://www.tutorialspoint.com/python/python_lists.htm) or dict depending on how the collection is mostly conveniently indexed. The datastructure of the four datatypes is summarized below. | Dataset | Structure | Index | |--------------------|---------------|---------------| | `rooms` | list of dicts | `int` | | `scenes` | list of dicts | `int` | | `listener` | dict of dicts | `LISTENER_ID` | | `scenes_listeners` | dict of lists | `LISTENED_ID` | Data is stored in [JavaScript Object Notation (JSON) format](https://en.wikipedia.org/wiki/JSON) and the components `scenes`, `rooms`, `listeners` and `scene_listeners` can be loaded with the following. ``` python import json with open("clarity_data/demo/metadata/scenes.demo.json") as f: scenes = json.load(f) with open("clarity_data/demo/metadata/rooms.demo.json") as f: rooms = json.load(f) with open("clarity_data/demo/metadata/listeners.json") as f: listeners = json.load(f) with open("clarity_data/demo/metadata/scenes_listeners.dev.json") as f: scenes_listeners = json.load(f) ``` Elements of a list are accessed using the numerical index (starting at `0`). Whilst elements of a dictionary are accessed using the keys. We extract the first (`0`th) scene and inspect the `SNR` ``` python scene_0 = scenes[0] print(f"Keys for scene_0 : {scene_0.keys()}") print(f'Value of SNR for scene_0 : {scene_0["SNR"]}') # Directly... print(f'Value of SNR for scene_0 : {scenes[0]["SNR"]}') ``` ### Processing Collections of Scenes Processes can be run over the complete list of scenes using standard Python iteration tools such as `for` and in particular [list](https://realpython.com/list-comprehension-python/) and [dictionary comprehension](https://realpython.com/iterate-through-dictionary-python/#using-comprehensions). ``` python import numpy as np from matplotlib import pyplot as plt fig, ax = plt.subplots(1, 2, figsize=(16,9)) # Get list of SNRs of scenes snr_values = np.array([s["SNR"] for s in scenes], dtype="float32") # Plot histogram ax[0].hist(snr_values) ax[0].set_title("Histogram of SNR values") ax[0].set_xlabel("SNR (dB)") # Get list of number of interferers in scenes n_interferers = np.array([len(s["interferers"]) for s in scenes], dtype="int32") # Prepare data for boxplot snr_comparison_data = [ [s for s, n in zip(snr_values, n_interferers) if n == 2], [s for s, n in zip(snr_values, n_interferers) if n == 3], ] # Plot boxplot ax[1].boxplot(np.array(snr_comparison_data, dtype="object")) ax[1].set_xlabel("Number of interferers") ax[1].set_ylabel("SNR (dB)") plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9, top=0.9, wspace=0.4, hspace=0.4) fig.show() ``` ### Associations between metadata types There are various associations between the metadata types which sometime require cross referencing from one collection to another. For example, room dimensions are stored in the room dict rather than directly in the scene dict. So to get the room dimensions for a given scene, you need to first look at the room ID field in the scene to find the correct room. One approach to doing this is shown below. ``` python room_id = scene_0["room"] # Iterate through rooms to find the one named `room_id` room = next((item for item in rooms if item["name"] == room_id), None) print(room["dimensions"]) ``` This approach uses a linear search and is therefore not very efficient. If you are going to be doing this often you might want to convert the list of rooms into a dictionary indexed by room ID, e.g. ``` python room_dict = {room["name"]: room for room in rooms} ``` You can now look up the dimensions of a scene's room more efficiently, ``` python room_id = scene_0["room"] room_dict[room_id] print(room["dimensions"]) ``` ### Example: Locating information about the scene's listener We will now use these ideas to plot the audiograms of one of the listeners associated with a specific scene. The code also prints out some information about the target and listener locations that are stored in the scene's associated room dict. ``` python scene_no = 32 # this is just an arbitrary index. try any from 0 - 49 scene = scenes[scene_no] room = room_dict[scene["room"]] current_listeners = scenes_listeners[scene["scene"]] print( f'\nScene number {scene_no} (ID {scene["scene"]}) has room dimensions of {room["dimensions"]}' ) print( f'\nSimulated listeners for scene {scene_no} have spatial attributes: \n{room["listener"]}' ) print(f'\nAudiograms for listeners in Scene ID {scene["scene"]}') fig, ax = plt.subplots(1, len(current_listeners)) ax[0].set_ylabel("Hearing level (dB)") for i, l in enumerate(current_listeners): listener_data = listeners[l] (left_ag,) = ax[i].plot( listener_data["audiogram_cfs"], -np.array(listener_data["audiogram_levels_l"]), label="left audiogram", ) (right_ag,) = ax[i].plot( listener_data["audiogram_cfs"], -np.array(listener_data["audiogram_levels_r"]), label="right audiogram", ) ax[i].set_title(f"Listener {l}") ax[i].set_xlabel("Hz") ax[i].set_ylim([-100, 10]) plt.legend(handles=[left_ag, right_ag]) plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9, top=0.9, wspace=0.4, hspace=0.4) plt.show() ``` ## 02 Running the CEC2 Baseline from the command line Python comes with a *Read, Eval, Print, Loop* (REPL) interactive shell that can be started from within your Virtual Environment by typing `python`. Many users prefer the improved [iPython](https://ipython.org/index.html) shell which can be installed with `pip install ipython` and invoked with `ipython`. Either shell works with the following. ### Install Demo Data In a shell navigate to the location where you have cloned the pyClarity repository and start an iPython shell ``` bash $ cd ~/path/to/where/pyclarity/is/cloned $ BASE_DIR=$(pwd) $ ipython Python 3.10.5 (main, Jun 6 2022, 18:49:26) [GCC 12.1.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: ``` You can now get the demo data with... ``` python from clarity.data import demo_data demo_data.get_metadata_demo() demo_data.get_scenes_demo() exit ``` Now you have exited to your shell you can change working directory to the location of the shell scripts you wish to run and see what files have been downloaded. ``` bash cd clarity/recipes/cec2/baseline pwd ls -lha ``` ## Inspecting Existing Configuration All of the included shell scripts take configurable variables from the yaml files in the same directory as the shell script.Typically these are named config.yaml, however, other names may be used if more than one shell script is in a directory. We can inspect the contents of the config file using !cat: ``` bash cat config.yaml ``` The general organisation of the config files is hierarchical, with property labels depending on the script in question. The config file for the enhance and evaluate recipes contains configurable paramaters for both scripts. These include: - Paths for the locations of audio files, metadata and the export location for generated files - Paramaters for the NAL-R fitting - Paramaters for the automatic gain control (AGC) compressor used in the baseline enhancer - Parameters for the challenge evaluator - Parameters necessary for Hydra to run The `path.root` parameter defaults to a null value (`???`) and must be overriden with a dataset root path when the Python script is called in the command line, e.g. ``` bash user:~$ python mypythonscript.py path.root='/path/to/project' ``` Note the lack of slash at the end of the path.root argument string. If you inspect a variable such as path.metadata_dir you will see that this slash is already included in the line. ``` yaml path: root: ??? metadata_dir: ${path.root}/clarity_data/metadata ``` The general form for overriding a parameter in the CLI is dot indexed. For the following entry in a config.yaml file: ``` yaml A: B: parameter_0: some_value parameter_1: some_other_value ``` The CLI syntax to override those values would be: ``` bash python myscript.py A.B.parameter_0="new_value" A.B.parameter_1="another_new_value" ``` We are now ready to run the prepared Python script `recipes/cec2/baseline/enhance.py` to enhance the audio. However, the standard configuration is designed to work with the full clarity dataset. We can redirect the script to the correct folders to use the demo data we have downloaded by overriding the appropriate configuration parameters. ``` bash python enhance.py \ path.root=${BASE_DIR} \ path.metadata_dir="$\{path.root\}/clarity_data/demo/metadata" \ path.scenes_listeners_file="$\{path.metadata_dir\}/scenes_listeners.demo.json" \ path.listeners_file="$\{path.metadata_dir\}/listeners.json" \ path.scenes_folder="$\{path.root\}/clarity_data/demo/scenes" ``` ``` python from pathlib import Path import IPython.display as ipd audio_path = Path("exp/exp/enhanced_signals") audio_files = list(audio_path.glob("**/*.wav")) # Listen to a single file print(audio_files[0]) ipd.Audio(audio_files[0]) ``` You can now use the `recipes/cec2/baseline/evaluate.py` script to generate HASPI scores for the signals. The evaluation is run in the same manner as the enhancement script. ``` bash python evaluate.py \ path.root=${BASE_DIR} \ path.metadata_dir="$\{path.root\}/clarity_data/demo/metadata" \ path.scenes_listeners_file="$\{path.metadata_dir\}/scenes_listeners.demo.json" \ path.listeners_file="$\{path.metadata_dir\}/listeners.json" \ path.scenes_folder="$\{path.root\}/clarity_data/demo/scenes" ``` Now the HASPI scores have been generated, it is possible to plot the results to assess the improvement imparted by the signal processing. Start a Python shell (`python` or `ipython`) and paste the following code. ``` python import numpy as np import pandas as pd from matplotlib import pyplot as plt unprocessed_si = pd.read_csv("exp/exp/si.csv") processed_si = pd.read_csv("exp/exp/si_unproc.csv") data = np.array([processed_si.loc[:, "haspi"], unprocessed_si.loc[:, "haspi"]]) plt.boxplot(np.transpose(data)) plt.title("HASPI Scores") plt.xticks([1, 2], ["Unprocessed", "Processed"]) plt.show() ``` ## 03 Running the CEC2 Baseline from Python We will be using scene audio and associated metadata. This can be downloaded using the Clarity package's `demo_data` module. ``` python from clarity.data import demo_data demo_data.get_metadata_demo() demo_data.get_scenes_demo() ``` By default, the demo data will have been downloaded into a directory called `clarity_data`. ## Running the baseline ### Importing the baseline NALR and Compressor components The baseline enhancer is based on [NAL-R prescription fitting](https://pubmed.ncbi.nlm.nih.gov/3743918/). Since output signals are required to be in 16-bit integer format, a slow acting automatic gain control is implemented to reduce clipping of the signal introduced by the NAL-R fitting for audiograms which represent more severe hearing loss. The AGC is followed by a soft-clip function. The NAL-R and AGC (compressor) classes can be accessed by importing them from the `clarity.enhancer` module. ``` python from clarity.enhancer.compressor import Compressor from clarity.enhancer.nalr import NALR ``` ### Configuring the NALR and Compressor components To allow for scalable and flexible running on both local and HPC platforms, many Clarity challenge CEC2 scripts and tools depend on [hydra](https://hydra.cc/) and [submitit](https://github.com/facebookincubator/submitit) for the configuration of Python code, for the setting of environment variables such as dataset directories, and for enabling parallelisation of Python on both HPC and local machines. (A full description of how [hydra](https://hydra.cc/docs/intro/) and [submitit](https://github.com/facebookincubator/submitit) is used in the Clarity challenges is out of the scope of this tutorial). In this tutorial, we will be importing the baseline configuration file directly using `omegaconf`. The module can read a configuration file in YAML format and return a DictConfig object storing the configuration data. The configuration is included under the `clarity/recipes/cec2/baseline/config.yaml` from when you installed the `pyclarity** package. ### IMPORTANT The location of the `recipes` needs a little figuring out and will depend on how you have installed `pyclarity`. If you installed from PyPI using `pip` under a Miniconda virtual environment called `clarity` configured to store the virtual environments in the default location of `~/miniconda3/` then the `recipes` directory will be under `~/.miniconda/envs/clarity/lib/python-/recipes/`. If you used a more traditional Virtual Environment which is configured to save environments under `~/.virtualenv` and called your environment `pyclarity` then the location will be `~/.virtualenv/pyclarity/lib/python/site-packages/recipes/`. If you have installed pyclarity from GitHub then the recipes will be under `clarity/recipes/` in the cloned directory. If you are unsure where to find the files and are on a UNIX like operating system you can search for them with the following... ``` bash find ~/ -type f -iname "config.yaml" | grep recipes ``` ``` python from omegaconf import DictConfig, OmegaConf # IMPORTANT - Modify the following line to reflect where the recipes are located, see notes above cfg = OmegaConf.load("/home//.virtualenv/pyclarity/lib/python3.10/site-packages/recipes/cec2/baseline/config.yaml") assert isinstance(cfg, DictConfig) ``` We will need to override some of the standard paths provided in the baseline `config.yaml` to enable us to run the baseline on the demo data in this tutorial environment. We need to supply: - The root directory of the project data and metadata - The directory of the metadata - The directory of the audio data The default configuration can be overridden by changing the values in the `cfg` object. ``` python cfg.path["root"] = "clarity_data/demo" cfg.path["metadata_dir"] = "${path.root}/metadata" cfg.path["scenes_folder"] = "${path.root}/scenes" ``` (Side note: the Clarity tools come with higher level `recipe` scripts that are designed to be used from the command line. When working with these, default configurations can be overriden by passing command line arguments.) With the configuration modified, we can now instantiate our `NALR` and `Compressor` objects. ``` python enhancer = NALR(**cfg.nalr) compressor = Compressor(**cfg.compressor) ``` ### Selecting a scene and a listener NAL-R fitting involves creating a complementary filterbank that is tuned to the audiogram of a specific listener. For each scene in the Clarity data set, there are three associated listeners that have been randomly selected, i.e., you are told which listeners to process each scene for. Using the right listeners is particularly important when processing the development (and evaluation) data, i.e., to ensure that your results are comparable with those of others. The listener audiogram data and the scene-listener associations are defined in the Clarity metadata. We will first load the scene, targets, listeners and scene_listeners data from the JSON files in which they are stored: ``` python import json with open("clarity_data/demo/metadata/scenes.demo.json") as f: scene_metadata = json.load(f) with open("clarity_data/demo/metadata/listeners.json") as f: listeners_metadata = json.load(f) with open("clarity_data/demo/metadata/scenes_listeners.dev.json") as f: scene_listeners_metadata = json.load(f) ``` Next, we will select an individual scene from `scenes_metadata`, find its associated listener's and then find the listener's audiogram data. So we first choose a scene the the `scene_metadata` list using a `scene_index`, i.e., ``` python scene_index = 2 scene = scene_metadata[scene_index] print(scene) ``` We find the scene's listeners by looking them up in the `scene_listeners_metadata` dict using the scene's `scene_id` as the key. ``` python scene_id = scene["scene"] scene_listeners = scene_listeners_metadata[scene_id] print(scene_listeners) ``` This provides us with the list of `listener_id`s for this scene. We will select one `listener_id` from this list and use it as the key to select the required listener metadata. ``` python listener_choice = 1 listener_id = scene_listeners[listener_choice] listener = listeners_metadata[listener_id] ``` Each listener metadata entry is a dict containing: - Listener ID - Audiogram centre frequencies - Left ear audiogram hearing levels (dBHL) - Right ear audiogram hearing levels (dBHL) ``` python print(listener) ``` ### Loading the signals to process Next we will load in the scene audio for the scene that we want to process. The path to the scenes audio data is stored in the `cfg.path.scenes_folder` variable and the audio files are named with the scene_id as the prefix and using the format. ```text __.wav ``` where `TYPE` can be `mix`, `target`, `interferer` or `interferer_anechoic`, and `CHANNEL` can be `CH1`, `CH2`, `CH3` or `CH0`. The baseline system just uses `CH1` (the front microphone of the hearing aid). Finally, signals are stored as 16-bit integer audio and must be converted to floating point (between -1.0 and 1.0) before use, i.e. by dividing by 2**15. So, using the `wavfile` module from `scipy.io` to read the file, we have, ``` python from pathlib import Path from scipy.io import wavfile fs, signal = wavfile.read(Path(cfg.path.scenes_folder) / f"{scene_id}_mix_CH1.wav") signal = signal / 32768.0 ``` We can plot the signal to check it looks OK, ``` python import matplotlib.pylab as plt plt.plot(signal) plt.show() ``` ### Applying the NALR and Compressor components We will now build the NALR filterbank according to the audiograms of the listener we have selected and apply the filter to the scene signal. This is done separately for the left and right ear (i.e., for each channel of the stereo scene signal). ``` python import numpy as np nalr_fir, _ = enhancer.build(listener["audiogram_levels_l"], listener["audiogram_cfs"]) out_l = enhancer.apply(nalr_fir, signal[:, 0]) nalr_fir, _ = enhancer.build(listener["audiogram_levels_r"], listener["audiogram_cfs"]) out_r = enhancer.apply(nalr_fir, signal[:, 1]) plt.plot(out_l) plt.show() ``` Following this, slow AGC is applied and a clip detection pass is performed. A tanh function is applied to remove high frequency distortion components from cliipped samples and the files are converted back to 16-bit integer format for saving. ``` python out_l, _, _ = compressor.process(out_l) out_r, _, _ = compressor.process(out_r) enhanced_audio = np.stack([out_l, out_r], axis=1) plt.plot(enhanced_audio) plt.show() ``` Finally, the signals are placed through a tanh function which provides a soft-clipping to handle any transient segments that have not been dealt with by the ACG. The final signals are then converted back into 16-bit format. ``` python n_clipped = np.sum(np.abs(enhanced_audio) > 1.0) if n_clipped > 0: print(f"{n_clipped} samples clipped") enhanced_audio = np.tanh(enhanced_audio) np.clip(enhanced_audio, -1.0, 1.0, out=enhanced_audio) plt.plot(enhanced_audio) plt.show() ``` Note, processed signals will be submitted as 16-bit wav-file format, i.e. by first converting back to 16-bit integer format and then saving to file. ```python signal_16 = (32768.0 * enhanced_audio).astype(np.int16) ``` The standard filename for the processed audio is constructed as ```python filename = f"{scene['scene']}_{listener['name']}_HA-output.wav" ``` ## Evaluating outputs using HASPI Enhanced scores can now be evaluated using the HASPI speech intelligibility prediction metric and compared to the unenhanced audio. HASPI scores are calculated using a 'better ear' approach where left and right signals are acalculated and the higher score used as the output. The 'better ear' haspi function (`haspi_v2_be`) is imported from `clarity.evaluator.haspi`. HASPI is an intrusive metric and requires an uncorrupted reference signal. These are provided in the scenes audio data as files with the naming convention `SXXXX_target_CHX.wav`. CH1 is used as the reference transducer for this challenge. We load the file and convert to floating point as before. ``` python from clarity.evaluator.haspi import haspi_v2_be fs, reference = wavfile.read(Path(cfg.path.scenes_folder) / f"{scene_id}_target_CH1.wav") reference = reference / 32768.0 ``` We provide the function `haspi_v2_be` with the left and right references, the left and right signals, the sample rate and the audiogram information for the given listener. Below, we first compute the HASPI score for the unprocessed signal and then for the enhanced signal. We can compute the benefit of the processing by calculating the difference. ``` python sii_unprocessed = haspi_v2_be( xl=reference[:, 0], xr=reference[:, 1], yl=signal[:, 0], yr=signal[:, 1], fs_signal=fs, audiogram_l=listener["audiogram_levels_l"], audiogram_r=listener["audiogram_levels_r"], audiogram_cfs=listener["audiogram_cfs"], ) sii_enhanced = haspi_v2_be( xl=reference[:, 0], xr=reference[:, 1], yl=enhanced_audio[:, 0], yr=enhanced_audio[:, 1], fs_signal=fs, audiogram_l=listener["audiogram_levels_l"], audiogram_r=listener["audiogram_levels_r"], audiogram_cfs=listener["audiogram_cfs"], ) print(f"Original audio HASPI score is {sii_unprocessed}") print(f"Enhanced audio HASPI score is {sii_enhanced}") print(f"Improvement from processing is {sii_enhanced-sii_unprocessed}") ``` For the scene and listener we have selected the original HASPI score should be about `0.081` and the score after enhancement should be about `0.231`. Note, HASPI uses internal masking noise and because we have not set the random seed, scores may vary a little from run to run - the variation should not be more than `+-0.0005` and often much less. Note also that the 'enhanced' score is still very low - this is not surprising given that the processing is only amplying amplification and compression. There is no noise cancellation, no multichannel processing, etc, etc. The purpose of the enhancement challenge is to add these components in order to try and improve on this baseline. Good luck!