Audio-based emotion recognition has many applica- tions in human-computer interaction, mental health assessment, and customer service analytics. This project presents a machine learning-based on-device emotion (i.e., anger, disgust, fear, hap- piness, neutrality, sadness, and surprise) recognition from audio for low-cost embedded devices. We show the influence of the speaker’s mental state on various acoustic features, such as intensity, shimmer, etc. However, classifying the emotions from audio is challenging, as these emotions sound ambiguous for different speakers. Our extensive evaluation with lightweight machine learning models indicates an overall F1-score of 61% with below 50 ms response time and 256 KB memory usage in modern embedded devices.
** Documentation on the Code and dataset will be available soon **