animal-sounds

The aim of this software is to classify Chimpanze vocalizations in audio recordings from the tropical rainforests of Africa. The software can be used for processing raw audio data, extracting features, and apply and compare Support Vector Machines and Deep learning methods for classification. The pipeline is reusable for other settings and species or vocalization types as long as a certain amount of labeled data has been collected. The best performing models will be available here for general usage.

About the Project

Date: June 2022

Researchers:

Joeri Zwerts (j.a.zwerts@uu.nl)
Heysem Kaya (h.kaya@uu.nl)

Research Software Engineers:

Parisa Zahedi (p.zahedi@uu.nl)
Casper Kaandorp (c.s.kaandorp@uu.nl)
Jelle Treep (h.j.treep@uu.nl)

Dataset description

The initial dataset for this project contains recordings in .wav format at 1 minute length and at a sample rate of 48000 samples/second. The recordings are taken at three locations in (or close to) the tropical rainforest of Cameroon and Congo:

Chimpanze sanctuary - Congo
Natural forest - Congo
Semi-natural Chimanze enclosures - Cameron

Preprocessing

The Chimpanze sanctuary recordings are labeled into 2 classes (Chimpanze & background) using Raven Pro annotation software, and extracted from the original recordings. Find scripts here.
To speed up the labeling process we developed an energy-change based algorithm to filter out irrelevant parts of the recordings, see Condensation. This was done after a first labelling effort. After this another labelling effort took place on the condensed files.
To increase and diversify our training set we have created synthetic samples by embedding the sanctuary vocalizations into the recorded jungle audio that is labeled as 'background', see Synthetic data.

The labeled sections of audio signal from the steps above are then split into frames of 0.5 seconds length with 0.25 seconds overlap. This results in the following input dataset for training the classifiers:

Dataset	# Chimpanze samples	# Background samples
Sanctuary	17.921	74.163
Synthetic	68.757	97.149

The recordings from the Semi-natural Chimpanze enclosures are used as an independent evaluation of the classifiers that are described below.

Feature extraction

We trained the models on frames of 0.5 seconds.
Before calculating features we apply a Butterworth bandpass filter with low cutoff at 100 Hz and a high cutoff at 2000 Hz.
For classification using SVM we extract statistical features from different representations of the audio signal.
For classification using Deep learning we use a mel spectrogram representation as input.


Chimpanze vocalization in mel spectrogram representation

Classification

SVM
From the 1140 statistical features from the previous step we select a normalized feature set of 50 features. The selection is based on feature importances computed with an Extra Trees Classifier. We train and optimize the SVM model on those 50 features using 'macro average recall' as evaluation criterion. On the independent test set the SVM model establishes a 'macro average recall' of 0.87.


SVM prediction results for A6 recorder

Deep learning
We trained several architectures of Convolutional Neural Networks (CNN) and a Residual network model (Resnet). CNN10 is the best performing model.

Trained on	SVM	CNN	CNN10
Sanctuary	0.86	0.81	0.83
Synthetic	0.65	0.82	0.85
Sanctuary + Synthetic	0.87	0.83	0.87

Built with

License

The code that is developed in this project is released under Apache 2.0. Some of the scripts for feature extraction that we use in this project are available under CeCILL 1.1 license. The scripts where this is the case contain license information at the header lines of the scripst. The original versions of these scripts are created by Marielle Malfante and are available via GitHub.

Relevant publications

Introducing a central african primate vocalisation dataset for automated species classification.\ Zwerts, J. A., Treep, J., Kaandorp, C. S., Meewis, F., Koot, A. C., & Kaya, H. (2021).\ arXiv preprint
The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 cough, COVID-19 speech, escalation & primates.
Schuller, B. W., Batliner, A., Bergler, C., Mascolo, C., Han, J., Lefter, I., ... & Kaandorp, C. (2021).
arXiv preprint
Automatic Analysis Architecture, M. MALFANTE, J. MARS, M. DALLA MURA DOI: 10.5281/zenodo.126028

Getting Started

To obtain all methods in this repository:

git clone https://github.com/UtrechtUniversity/animal-sounds.git

Install all required python libraries:

cd animal-sounds
python -m pip install -r requirements.txt

There are two situations in which you can directly apply the scripts in this repository and we tailored the documentation towards these situations:

You have audio data and a set of manual annotations (in e.g. txt or csv format) and want to use the whole pipeline including training your own model. Find getting started instructions for each step in the respective folders: 1_wav_processing, 2_feature_extraction and 3_classifier
You have a highly similar dataset and want to use one of our models to help find Chimpanze vocalizations.

Project structure

This project uses the following directory structure. After cloning the repository on your local PC, organize your data in the repository using the structure below to make sure the scripts 'know' where the data is located.

.
├── .gitignore
├── CITATION.md
├── LICENSE.md
├── README.md
├── requirements.txt
├── bioacoustics              <- main folder for all source code
│   ├── 1_wav_processing 
│   ├── 2_feature_extraction
│   └── 3_classifier        
├── data               <- All project data, ignored by git
│   ├── original_wav_files
│   ├── processed_wav_files            
│   └── txt_annotations           
└── output
    ├── features        <- Figures for the manuscript or reports, ignored by git
    ├── models          <- Models and relevant training outputs
    ├── notebooks       <- Notebooks for analysing results
    └── results         <- Graphs and tables

Contributing

Contributions are what make the open source community an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

To contribute:

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Contact

Joeri Zwerts - j.a.zwerts@uu.nl

Research Engineering team - research.engineering@uu.nl

Project Link: https://github.com/UtrechtUniversity/animal-sounds

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github		.github
bioacoustics		bioacoustics
docs		docs
img		img
output		output
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

animal-sounds

Table of Contents

About the Project

Dataset description

Preprocessing

Feature extraction

Classification

Built with

License

Relevant publications

Getting Started

Project structure

Contributing

Contact

About

Contributors 4

Languages

License

UtrechtUniversity/animal-sounds

Folders and files

Latest commit

History

Repository files navigation

animal-sounds

Table of Contents

About the Project

Dataset description

Preprocessing

Feature extraction

Classification

Built with

License

Relevant publications

Getting Started

Project structure

Contributing

Contact

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 4

Languages