The aim of this software is to classify Chimpanze vocalizations in audio recordings from the tropical rainforests of Africa. The software can be used for processing raw audio data, extracting features, and apply and compare Support Vector Machines and Deep learning methods for classification. The pipeline is reusable for other settings and species or vocalization types as long as a certain amount of labeled data has been collected. The best performing models will be available here for general usage.

Table of Contents

About the Project

Date: June 2022


Research Software Engineers:

Dataset description

The initial dataset for this project contains recordings in .wav format at 1 minute length and at a sample rate of 48000 samples/second. The recordings are taken at three locations in (or close to) the tropical rainforest of Cameroon and Congo:


  1. The Chimpanze sanctuary recordings are labeled into 2 classes (Chimpanze & background) using Raven Pro annotation software, and extracted from the original recordings. Find scripts here.

  2. To speed up the labeling process we developed an energy-change based algorithm to filter out irrelevant parts of the recordings, see Condensation. This was done after a first labelling effort. After this another labelling effort took place on the condensed files.

  3. To increase and diversify our training set we have created synthetic samples by embedding the sanctuary vocalizations into the recorded jungle audio that is labeled as ‘background’, see Synthetic data.

The labeled sections of audio signal from the steps above are then split into frames of 0.5 seconds length with 0.25 seconds overlap. This results in the following input dataset for training the classifiers:

Dataset # Chimpanze samples # Background samples
Sanctuary 17.921 74.163
Synthetic 68.757 97.149

The recordings from the Semi-natural Chimpanze enclosures are used as an independent evaluation of the classifiers that are described below.

Feature extraction

We trained the models on frames of 0.5 seconds.
Before calculating features we apply a Butterworth bandpass filter with low cutoff at 100 Hz and a high cutoff at 2000 Hz.
For classification using SVM we extract statistical features from different representations of the audio signal.
For classification using Deep learning we use a mel spectrogram representation as input.

Chimpanze vocalization in mel spectrogram representation


From the 1140 statistical features from the previous step we select a normalized feature set of 50 features. The selection is based on feature importances computed with an Extra Trees Classifier. We train and optimize the SVM model on those 50 features using ‘macro average recall’ as evaluation criterion. On the independent test set the SVM model establishes a ‘macro average recall’ of 0.87. | | |:–:| | SVM prediction results for A6 recorder |

Deep learning
We trained several architectures of Convolutional Neural Networks (CNN) and a Residual network model (Resnet). CNN10 is the best performing model.

Trained on SVM CNN CNN10
Sanctuary 0.86 0.81 0.83
Synthetic 0.65 0.82 0.85
Sanctuary + Synthetic 0.87 0.83 0.87

Built with


The code that is developed in this project is released under Apache 2.0. Some of the scripts for feature extraction that we use in this project are available under CeCILL 1.1 license. The scripts where this is the case contain license information at the header lines of the scripst. The original versions of these scripts are created by Marielle Malfante and are available via GitHub.

Relevant publications

Getting Started

To obtain all methods in this repository:

git clone

Install all required python libraries:

cd animal-sounds
python -m pip install -r requirements.txt

There are two situations in which you can directly apply the scripts in this repository and we tailored the documentation towards these situations:

  1. You have audio data and a set of manual annotations (in e.g. txt or csv format) and want to use the whole pipeline including training your own model. Find getting started instructions for each step in the respective folders: 1_wav_processing, 2_feature_extraction and 3_classifier
  2. You have a highly similar dataset and want to use one of our models to help find Chimpanze vocalizations.

Project structure

This project uses the following directory structure. After cloning the repository on your local PC, organize your data in the repository using the structure below to make sure the scripts ‘know’ where the data is located.

├── .gitignore
├── requirements.txt
├── bioacoustics              <- main folder for all source code
│   ├── 1_wav_processing 
│   ├── 2_feature_extraction
│   └── 3_classifier        
├── data               <- All project data, ignored by git
│   ├── original_wav_files
│   ├── processed_wav_files            
│   └── txt_annotations           
└── output
    ├── features        <- Figures for the manuscript or reports, ignored by git
    ├── models          <- Models and relevant training outputs
    ├── notebooks       <- Notebooks for analysing results
    └── results         <- Graphs and tables


Contributions are what make the open source community an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

To contribute:

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request


Joeri Zwerts -

Research Engineering team -

Project Link: