This repository contains the code for my Master's Thesis project "Piano Music Generation using Deep Learning Transformer Models", which was completed at the Artificial Intelligence and Learning Systems Laboratory (AILS) of the School of Electrical & Computer Engineering, National Technical University of Athens (ECE NTUA) under the guidance of Prof. Giorgos Stamou, and Ph.D. candidates Vassilis Lyberatos and Spyros Kantarelis.
We trained two Deep Learning Transformer models for symbolic music generation using Python, TensorFlow, and PyTorch frameworks and utilizing Ubuntu servers at AILS ECE NTUA and ARIS GRNET supercomputer. We generated music outputs and conducted objective and subjective evaluations. For the subjective evaluation, we implemented a listening test page using Flask framework in Python, Firebase Storage and deployed it as a Serverless Function on Vercel. We utilized Pandas for data processing and analysis, investigating the impact of various parameters on music generation and proposed a set of primary subjective quality indicators for music generated by artificial intelligence models.
William Wordsworth famously wrote in "The Solitary Reaper": "The music in my heart I bore, long after it was heard no more". The emotional impact of music, particularly piano music, has transcended cultural barriers and touched people from all walks of life for centuries. In recent years, the composition of music with the assistance of artificial intelligence has become a notable area of interest that has garnered significant attention. Among multiple deep learning models proposed, the Transformer has been a prominent approach for generating longer piano performances. This thesis delves into the capabilities of symbolic piano music generation with two noteworthy Transformer models, Music Transformer and Perceiver-AR. We explore training these models in six different datasets of various sizes and musical genres, generate large-scale number of outputs for each trained model and evaluate them objectively and subjectively. We investigate the impact of training datasets, model types and trained models on the music generation process. We also examine the correlations of objective and subjective evaluation metrics and propose a set of primary subjective quality indicators for music generated by artificial intelligence models. Finally, we suggest possible improvements and areas for future research.
This project research aim is to create a composition assistance system for symbolic piano music generation that would support amateur musicians in their musical pursuits. Our inspiration was rooted in the desire to provide musicians in the music field with an accessible and intuitive platform for musical expression and creativity. Our goal is to develop a system that would empower amateur musicians to unleash their musical potential and create unique and original pieces of music, making the music creation process more accessible but also more enjoyable, thereby bringing our musical visions to life.
To achieve the goal of the project, the following objectives have been identified:
- Train Music Transformer and Perceiver-AR models on six datasets of various sizes and musical genres.
- Generate a large number of outputs for each trained model.
- Conduct objective and subjective evaluation of the training datasets and model outputs.
- Investigate how the attributes of training datasets, model types, and trained models impact the quality of musical outputs generated.
- Estimate the correlations between evaluation metrics and explore possible causal relationships.
- Propose a set of primary subjective quality indicators for music generated by artificial intelligence models.
- Frameworks Used
- Installation
- Usage
- Acknowledgements
- License