The aim of this software is to classify Chimpanze vocalizations in audio recordings from the tropical rainforests of Africa. The software can be used for processing raw audio data, extracting features, and apply and compare Support Vector Machines and Deep learning methods for classification. The pipeline is reusable for other settings and species or vocalization types as long as a certain amount of labeled data has been collected. The best performing models will be available here for general usage.
Date: June 2022
Researchers:
Research Software Engineers:
The initial dataset for this project contains recordings in .wav
format at 1 minute length and at a sample rate of 48000 samples/second. The recordings are taken at three locations in (or close to) the tropical rainforest of Cameroon and Congo:
The Chimpanze sanctuary recordings are labeled into 2 classes (Chimpanze & background) using Raven Pro annotation software, and extracted from the original recordings. Find scripts here.
To speed up the labeling process we developed an energy-change based algorithm to filter out irrelevant parts of the recordings, see Condensation. This was done after a first labelling effort. After this another labelling effort took place on the condensed files.
To increase and diversify our training set we have created synthetic samples by embedding the sanctuary vocalizations into the recorded jungle audio that is labeled as ‘background’, see Synthetic data.
The labeled sections of audio signal from the steps above are then split into frames of 0.5 seconds length with 0.25 seconds overlap. This results in the following input dataset for training the classifiers:
Dataset | # Chimpanze samples | # Background samples |
---|---|---|
Sanctuary | 17.921 | 74.163 |
Synthetic | 68.757 | 97.149 |
The recordings from the Semi-natural Chimpanze enclosures are used as an independent evaluation of the classifiers that are described below.
We trained the models on frames of 0.5 seconds.
Before calculating features we apply a Butterworth bandpass filter with low cutoff at 100 Hz and a high cutoff at 2000 Hz.
For classification using SVM we extract statistical features from different representations of the audio signal.
For classification using Deep learning we use a mel spectrogram representation as input.
Chimpanze vocalization in mel spectrogram representation |
SVM
From the 1140 statistical features from the previous step we select a normalized feature set of 50 features. The selection is based on feature importances computed with an Extra Trees Classifier. We train and optimize the SVM model on those 50 features using ‘macro average recall’ as evaluation criterion.
On the independent test set the SVM model establishes a ‘macro average recall’ of 0.87.
| |
|:–:|
| SVM prediction results for A6 recorder |
Deep learning
We trained several architectures of Convolutional Neural Networks (CNN) and a Residual network model (Resnet). CNN10 is the best performing model.
Trained on | SVM | CNN | CNN10 |
---|---|---|---|
Sanctuary | 0.86 | 0.81 | 0.83 |
Synthetic | 0.65 | 0.82 | 0.85 |
Sanctuary + Synthetic | 0.87 | 0.83 | 0.87 |
The code that is developed in this project is released under Apache 2.0. Some of the scripts for feature extraction that we use in this project are available under CeCILL 1.1 license. The scripts where this is the case contain license information at the header lines of the scripst. The original versions of these scripts are created by Marielle Malfante and are available via GitHub.
To obtain all methods in this repository:
git clone https://github.com/UtrechtUniversity/animal-sounds.git
Install all required python libraries:
cd animal-sounds
python -m pip install -r requirements.txt
There are two situations in which you can directly apply the scripts in this repository and we tailored the documentation towards these situations:
This project uses the following directory structure. After cloning the repository on your local PC, organize your data in the repository using the structure below to make sure the scripts ‘know’ where the data is located.
.
├── .gitignore
├── CITATION.md
├── LICENSE.md
├── README.md
├── requirements.txt
├── bioacoustics <- main folder for all source code
│ ├── 1_wav_processing
│ ├── 2_feature_extraction
│ └── 3_classifier
├── data <- All project data, ignored by git
│ ├── original_wav_files
│ ├── processed_wav_files
│ └── txt_annotations
└── output
├── features <- Figures for the manuscript or reports, ignored by git
├── models <- Models and relevant training outputs
├── notebooks <- Notebooks for analysing results
└── results <- Graphs and tables
Contributions are what make the open source community an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
To contribute:
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)Joeri Zwerts - j.a.zwerts@uu.nl
Research Engineering team - research.engineering@uu.nl
Project Link: https://github.com/UtrechtUniversity/animal-sounds