- Project is created with intection to detect/classify an audio signal if it is such as a cough or sneeze audio signal.
- Further goal is to pipeline this to mobile applications to narrow the detection of sickness audio specificlly of a COVID19.
- To contribute to help gov authorities to identify the persons with probable coronavirus infection living among us. ("We should fight the Virus, not the Patient effected with virus")
- Audio Signal Processing
- Basic ML Framework
- Bit Depth
- CNN
- Data visualization
- Digital Signal Processing
- Fast Fourier Transform
- Fliter Bank Coefficients
- Fourier Transform
- Hanning Window
- Implementation of a ML model in python
- Mel Filter Bank
- Mel Cepstrum Coefficients
- Preprocessing
- RNN
- Sampling and Sampling Frequency
- Sensors
- Short Time Fourier Transform
- Spectrogram
- What are different types of audio sources known?
- what are various audio file formats?
- How to read an audio file?
- what are various properties if audio files?
- How to vizualize an audio file?
- How to find Bit depth of an audio file in python?
- How to find various properties of an audio file?
- How to Extract Features from audio Files?
so on..
► When we think about sound:
We often think about how loud it is (amplitude, or intensity) and its pitch (frequency).
► In a given medium under fixed conditions, speed is constant. Hence, there is a relationship between frequency(f) and wavelength(λ); the higher the frequency, the smaller the wavelength► The animation above shows two acoustic longitudinal waves with two different frequencies but travelling with the same velocity. It can be seen that the wavelength is halved when the frequency is doubled.
► An interactive animation illustrating the amplitude, wavelength and phase of a sine wave. Varying the amplitude, wavelength and phase; observe the effects on the transverse wave
Sound in our environment is the energy, things produce when they vibrate (move back and forth quickly)
Image courtesy of NASA
►
Digital Sound Recording:
Method of preserving sound in which audio signals are
transformed
into a series of pulses that correspond to patterns of binary digits (0's and 1's)
What's the science of sound ? ►
►
Signal sampling representation:
- A sample is a value or set of values at a point in time and/or space.
- Sampler is a subsystem or operation that extracts samples from a continuous signal.
Fig: The continuous signal is represented with a green colored line while the discrete samples are indicated by the blue vertical lines.
Sampling Interval or Sampling Period:
Sampling performed by measuring the value of the continuous function every T seconds
Sampling Frequency or Sampling Rate:
The average number of samples obtained in one second (samples per second)
COVID'19 Cough Audio Analysis
Patient Details:
- Age: 49
- Sex: Male
- Country: UK
- Day: 5
- Resource Date: Mar 23, 2020
- Infection Symptoms: cannot Breathe, Heavy Coughs.
- Health Status before effected by COVID'19: Healthy Person, Regular Swimmer
► Time Domain to Frequency Domain Transformation:
Signal Feature Extraction: ►
- Filter Banks
- Mel Frequency Cepstrum Coefficients
A signal goes through a pre-emphasis filter.
- Then gets sliced into (overlapping) frames
- A window function is applied to each frame
- Afterwards, we do a Fourier transform on each frame (or more specifically a Short-Time Fourier Transform)
- Calculate the power spectrum;
- And subsequently compute the filter banks.
- To obtain MFCCs, a Discrete Cosine Transform (DCT) is applied to the filter banks retaining a number of the resulting coefficients while the rest are discarded.
- A final step in both cases, is mean normalization.
► Steps used for calculating MFCCs for the COVID19 Cough audio sample:
- Slice the signal into short frames (of time)
- Compute the periodogram estimate of the power spectrum for each frame
- Apply the mel filterbank to the power spectra and sum the energy in each filter
- Take the discrete cosine transform (DCT) of the log filterbank energies
In monaural sound one single channel is used. It can be reproduced through several speakers, but all speakers are still reproducing the same copy of the signal.
In stereophonic sound more channels are used (typically two). You can use two different channels and make one feed one speaker and the second channel feed a second speaker (which is the most common stereo setup).
This is used to create directionality, perspective, space.
Dataset Source: https://osf.io/tmkud/
Motivation:
This dataset has been created for the Pfizer Digital Medicine Challenge.
-
Early detection of respiratory tract infections can lead to timely diagnosis and treatment, which can result in better outcomes and reduce the likelihood of severe complications.
-
Respiratory sounds carry rich information that can be mined to develop automated approaches for detection of sickness behaviors like coughing and sneezing.
-
In this challenge, we invite you to build machine learning models for automatic detection of sickness sounds by using audio recordings from open datasets.
-
The dataset was created using audio files from ESC-50 and AudioSet.
-
We used the open source BMAT Annotation Tool to annotate this dataset.
Develop machine learning models for detection of sickness sounds (coughing and sneezing)
The dataset is organized as follows:
train
- sick (n=1435)
- not_sick (n=2283)
validation
- sick (n=468)
- not_sick (n=753)
test
- sick (n=642)
- not_sick (n=1012)
- The data is in the directory Dataset
- further in the directories: 'Train' 'Test' and 'Validation'
- Each Set has two directories named by the dataset classes
- Its Big !!!
- Yes
- No, I'm in Lockdown and limited time, knowledge and internet is a concern for me!!
-
Have to use my old Intel i3 core :/ laptop to devolep few basic templates
-
Once I get internet access, I'll use the template to run on Google's Colab =')
-
After debugging, I'll increase the full dataset and re-run the program files for visualizaton, model training :O (A possible update on this :|)
Data is Cleaned and Following is the class distribution:
The above analysis explains that the dataset of both classes in the training folder is equally distributed in the length.
The MFCC Feature Extraction is applied to every training sample to get 13x99 features/coefficients. This is the method used to convert the audio data into numpy arrays
It is understood that the MFCC and Spectrograms of the audio signals can also be used as image dataset and build CNN Models to classify the audio samples.
Model comparision can be made between the current RNN model, transfer learning models and the CNN Models . An update on this is in progress :P
https://www.nasa.gov/specials/X59/science-of-sound.html
https://courses.lumenlearning.com/physics/chapter/17-2-speed-of-sound-frequency-and-wavelength/
https://blog.soton.ac.uk/soundwaves/wave-basics/wavelength-frequency-relation/
http://iamtechnical.com/wave-properties-amplitude-wavelength-and-phase-angle
https://www.explainthatstuff.com/sound.html
https://www.scienceguru.co.in/fileman/Uploads/PHY%2009/Sound/electrical%20guru%20noise%20level.png
https://www.nasa.gov/specials/X59/science-of-sound.html
http://www.libertycentral.org.uk/how-do-animals-hearing-compare-to-humans/
https://www.britannica.com/technology/digital-sound-recording
https://www.nasa.gov/specials/X59/science-of-sound.html
https://deepmind.com/blog/article/wavenet-generative-model-raw-audio
https://en.wikipedia.org/wiki/Sampling_(signal_processing)
https://www.youtube.com/watch?v=8VA73zW2DXY
https://aavos.eu/glossary/fourier-transform/
https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html