animal-sounds

Synthetic Data

It proved difficult to collect enough Chimpanzee vocalizations in the jungle to train our models. But we had an abundance of vocalizations from the sanctuary. To increase and diversify our training set we have created synthetic samples by embedding the sanctuary vocalizations into the recorded background noise of the jungle.

How does it work
Software dependencies
Usage
- Step 1
- Step 2
- Step 3
- Shell script
Remarks

How does it work

This folder contains a number of scripts that form a pipeline to produce our synthetic samples. Given a set of vocalizations, the pipeline consists of three steps:

Collect pure background noise fragments from our jungle recordings.
Register filenames and their duration of the background noise fragments and the vocalizations. The resulting overviews ensure fast and proper embedding.
Produce the synthetic data based on the overviews generated in the previous step. Randomly embed vocalizations into suitable noise fragments. Every vocalization produces multiple samples in which the vocalization’s presence varies in loudness.

Software requirements

Basic usage

This folder contains three numbered python scripts that form the pipeline, a shell script to run the entire pipeline and a sub folder test_data that stores all necessary data to demonstrate the pipeline. This sub folder consists of:

A folder recordings in which we can find a small number of original jungle recordings.
A text file raven_annotations.txt that contains timestamps of jungle noise fragments found in the WAV files in the recordings folder.
A folder vocalizations in which a small number of Chimpanzee vocalizations are stored.

Step 1

The first script 1_collect_background.py reads any number of Raven annotation files (with a .txt extension) and parses their contents, collecting the paths of the recordings and the relevant timestamps of fragments that were annotated as ‘Background’. With these timestamps it then extracts the fragments and stores them in a folder of choice. To execute the script on our test data, please run:

$ python 1_collect_background.py \
    --input_dir './test_data/recordings/' \
    --annotation_dir './test_data' \
    --output_dir './test_data/results/background' \
    --one_file 'False'

where --input_dir denotes the folder that contains the jungle recordings, --annotation_dir denotes the folder with the Raven annotation files and --output_dir denotes the folder in which the scripts collects the background noise fragments. ‘–one_file’ determines whether output is saved in one file or multiple files (default).

Step 2

The second script, 2_create_overview.py takes an input set of WAV files and registers their absolute filepaths and their duration. It produces a json file containing these registrations. In the next step we need such overviews for the set of background noise fragments and vocalizations so we can mix and match properly.

Provided you have executed the previous step on the test data, you can run the second script like this:

# create an overview for the background fragments
$ python 2_create_overview.py \
    --input_dir './test_data/results/background' \
    --output './test_data/results/overviews/background.json'

# create an overview for the vocalizations
$ python 2_create_overview.py \
    --input_dir './test_data/vocalizations' \
    --output './test_data/results/overviews/vocalizations.json'

Step 3

In the third and final step, performed by the 3_create_synth_sample.py script, we mix the vocalizations into the background fragments. Per vocalization a suitable candidate from the background collection is randomly selected. In case the algorithm can’t find a suitable candidate (because the duration of the vocalization is too long), the vocalization is chopped up into smaller fragments and the selection is repeated for these smaller fragments.

Per vocalization 4 new versions are created in which the loudness of the vocalization is increasingly dampened. Every version gets its own numerical suffix, denoting the amount of dampening in dB, multiplied by 10.

If we continue our test example, we run the third script like this:

$ python 3_create_synth_sample.py \
    --primate_json './test_data/results/overviews/vocalizations.json' \
    --background_json './test_data/results/overviews/vocalizations.json' \
    --output './test_data/results/synth_data/'

Shell script

A convenient shell script that runs all the demonstration steps consecutively can be found in this folder as well, and can be executed with:

$ ./synth_pipeline.sh

Remarks

Dampening the vocalization can be done to a certain extent. When the volume of the vocalizations is reduced too much, this sometimes leads to unusable samples.

This site is open source. Improve this page.