It proved difficult to collect enough Chimpanzee vocalizations in the jungle to train our models. But we had an abundance of vocalizations from the sanctuary. To increase and diversify our training set we have created synthetic samples by embedding the sanctuary vocalizations into the recorded background noise of the jungle.
This folder contains a number of scripts that form a pipeline to produce our synthetic samples. Given a set of vocalizations, the pipeline consists of three steps:
This folder contains three numbered python scripts that form the pipeline, a shell script to run the entire pipeline and a sub folder test_data
that stores all necessary data to demonstrate the pipeline. This sub folder consists of:
recordings
in which we can find a small number of original jungle recordings.raven_annotations.txt
that contains timestamps of jungle noise fragments found in the WAV files in the recordings
folder.vocalizations
in which a small number of Chimpanzee vocalizations are stored.The first script 1_collect_background.py
reads any number of Raven annotation files (with a .txt
extension) and parses their contents, collecting the paths of the recordings and the relevant timestamps of fragments that were annotated as ‘Background’. With these timestamps it then extracts the fragments and stores them in a folder of choice. To execute the script on our test data, please run:
$ python 1_collect_background.py \
--input_dir './test_data/recordings/' \
--annotation_dir './test_data' \
--output_dir './test_data/results/background' \
--one_file 'False'
where --input_dir
denotes the folder that contains the jungle recordings, --annotation_dir
denotes the folder with the Raven annotation files and --output_dir
denotes the folder in which the scripts collects the background noise fragments. ‘–one_file’ determines whether output is saved in one file or multiple files (default).
The second script, 2_create_overview.py
takes an input set of WAV files and registers their absolute filepaths and their duration. It produces a json file containing these registrations. In the next step we need such overviews for the set of background noise fragments and vocalizations so we can mix and match properly.
Provided you have executed the previous step on the test data, you can run the second script like this:
# create an overview for the background fragments
$ python 2_create_overview.py \
--input_dir './test_data/results/background' \
--output './test_data/results/overviews/background.json'
# create an overview for the vocalizations
$ python 2_create_overview.py \
--input_dir './test_data/vocalizations' \
--output './test_data/results/overviews/vocalizations.json'
In the third and final step, performed by the 3_create_synth_sample.py
script, we mix the vocalizations into the background fragments. Per vocalization a suitable candidate from the background collection is randomly selected. In case the algorithm can’t find a suitable candidate (because the duration of the vocalization is too long), the vocalization is chopped up into smaller fragments and the selection is repeated for these smaller fragments.
Per vocalization 4 new versions are created in which the loudness of the vocalization is increasingly dampened. Every version gets its own numerical suffix, denoting the amount of dampening in dB, multiplied by 10.
If we continue our test example, we run the third script like this:
$ python 3_create_synth_sample.py \
--primate_json './test_data/results/overviews/vocalizations.json' \
--background_json './test_data/results/overviews/vocalizations.json' \
--output './test_data/results/synth_data/'
A convenient shell script that runs all the demonstration steps consecutively can be found in this folder as well, and can be executed with:
$ ./synth_pipeline.sh
Dampening the vocalization can be done to a certain extent. When the volume of the vocalizations is reduced too much, this sometimes leads to unusable samples.