The proposed method offers a potential solution to the problem of high costs associated with composing and recording original music for films.
Our model is capable of recognizing a scene's dominant emotion by analyzing factors such as body language, facial expressions, lip reading, and background color. Using this emotion, the model generates a new piece of music ideally suited to the scene.First, We detect what emotion the video is trying to portray by using 4 models:
facial expression detection - Initially, we created and trained a face recognition model based on the VGG-16 architecture, then we used MediaPipe, an open-source framework for building cross-platform machine learning pipelines for perception tasks such as object detection, tracking, and facial recog- nition (Lugaresi et al., 2019), to locate the face in given videos, and then we used the aforementioned model to detect the emotion from the face. background hue recognition - We extract the color values of each pixel in the video, calculate the average color of the video by averaging the color values of the pixels, and then assign the color to a specific emotion according to 4.
Using FER2013 as dataset and implementing VGG16 Neural Network Achitechture
Collecting Keypoints with Mediapipe Holistic Model than Training The model With 30 frames per action
Collecting the frame of the down face than Training The model With 75 frames per action
Here are some examples of the emotions detected by our model and the corresponding music generated:
Click on the thumbnail to watch the Sound-IT demo.
Getting Started To get started with Sound-IT, you can clone our repository and follow the instructions in the README file.
To run the code : -first of all run the pip install -r requirements.txt to install all the packages: -then change the models path in the UISOUND folder in the files : -allinference.py -inference.py -inferenceCam.py
for example : weights_1 = '/Users/kevynkrancenblum/Desktop/Data Science/Final Project/Facial_emotion_recognition/saved_models/vggnet.h5' weights_2 = '/Users/kevynkrancenblum/Desktop/Data Science/Final Project/Facial_emotion_recognition/saved_models/vggnet_up.h5' model_V1=BodySentimentModel(body_input_shape, actions.shape[0]) model_V1.load_weights('/Users/kevynkrancenblum/Desktop/Data Science/Final Project/Body_Language_recognition/modelsSaved/BodyModelCamv1.h5')
model_V2=BodySentimentModel(body_input_shape, actions.shape[0])
model_V2.load_weights('/Users/kevynkrancenblum/Desktop/Data Science/Final Project/Body_Language_recognition/modelsSaved/BodyModelCamv2.h5')
Change the path to where the your model is located :
To train on your own emotion or own action recognition base on the body language : Go to : Body_language_recognition/streamlitRecording.py an change the code where you need to add, remove emotions or actions To train on your own emotion or facial micro emotion train you data by first of all adding your own data then run the model ( IMPORTANT THAT BECAUSE ITS MICRO EMOTION RECOGNITION YOU WILL NEED AN SIGNIFICANTE AMOUNT OF DATA ) :
FOR THE LIP READING CONSIDER RUNNING THE CODE IN LipReading/lipnet.ipynb to download the model weight
you can also train in you own language and own sentence by creating your own dataset with video and text aligment. important that the models is a CNN+RNN achitechture that mean thats for the Recurent Neural network you
have an predifine sentence lenght for example here 75 frames is the sentence lenght for every video and aligment otherwise the model won't workContributing We welcome contributions from the community. If you have any suggestions or would like to contribute, please open an issue or pull request on our GitHub repository.