"Because water is denser than air, sound travels very efficiently underwater. Sounds from some species of marine life and human activity can be heard many miles away and, in some cases, across oceans.
Passive acoustic instruments record these sounds in the ocean. There are some hydrophones that generate up to 24 terabytes a year! "e.g. Big Data"
This data provides valuable information that helps government agencies and industries understand and reduce the impacts of noise on ocean life.
By listening to sensitive underwater environments with passive acoustic monitoring tools, we can learn more about migration patterns, animal behavior and communication." quoted from noaa
The goal of this project is to explore marine animals classification. I will be implementing two machine learning models, a neural network and convolutional neural network. The marine animals that I'll be classifying:
- Killer Whale
- False Killer Whale
- Bowhead Whale
- White Sided Dolphin
- Risso Dolphin
- Northern Right Whale
- Humpback Whale
- Sperm Whale
- Short Finned Pilot Whale
This project will use labeled raw audio from:
-
Reading and web scraping audio files and their labels.
-
All the audio files were sliced into 30 second clips. Audio files that were longer than 30 seconds were decomposed into lengths of 30 seconds clips which helped generate more data.
-
Next I duplicated all the audio files per class and augmented those halves. I randomly augmented each file
- +/- 3 dB ,
- +/- 2 semitones,
- time stretch
- and added some noise.
-
Here, I implemented Dolby IO for analysis and enhancement over 1,000 audio clips.
-
This doubled the size of data in each class where exactly half of the data in each class is an augmented version of the original file.
Here is some exploratory visual representations of each class using spectrograms and oscillograms.
- Compare the waveform and the spectrogram from the dataset.
-
Audio feature choice for speech recognition/identification (1970s)
-
Visualize MFCCs Humpback Whale audio sample
- Visualize MFCCs Delta on the same Humpback Whale audio sample.
- extracted 13 (MFCCs) on 10 segments over each 30 seconds audio files. "e.g. every 3 seconds"
- generated more data to train on.
running the model on the test set and check performance.
A confusion matrix is helpful to see how well the model did on each of the marine animals in the test set.
Finally, verifying the models' prediction output using an input audio outside of dataset.
- This 30 second audio clip consist of two marine animals to recognize. Lets see how the model perform! False Killer Whale & Dolphin
The CNN model clearly recognized two sources in the audio file as "False Killer Whale and Dolphin."
-
Gather more data when it becomes available.
-
Train models with (mel) spectrograms and compare the results.
-
Implement Tensorflow audio data pipeline.
-
Add more classes of marine animals to recognize.
-
Introduce human sounds into the dataset. e.g. vessel and boats.