Music Analysis using Spectral Knowledge Representation and Reasoning: Density-based Clustering and Representation of Perceived Structure in Audio Signals
This thesis was created under supervision of Geraint Wiggins, Nicholas Harley and Steven Homer at the AI Lab in Brussels.
The extraction and formation of musical structures through the analysis of complex auditory scenes is a challenging task in signal processing and machine learning. Musical analysis includes multiple open subtasks to be resolved, such as multi-pitch estimation, musical note tracking and multi-pitch streaming. The main goal of this thesis is to create a framework for the multipurpose description and evaluation of music, allowing inference from different subtasks and a general improvement in the learnability of machine learning models. This was achieved by investigating into the implementation of a coherent structure between a spectral analysis of resonances and a type-based knowledge representation in the musical domain, forming an analogy to the perception, cognition and knowledge representation of human intelligence. We created pitch-based hierarchies formed through density-based clustering techniques in our self-defined hierarchical structure for the definition of musical objects perceived from audio signals. Our multipurpose framework for musical analysis has a methodological contribution to various practical applications due to its precision and ability to deal with overlapping sound events, which is one of the key challenges in music signal processing. Approaching this problem through a cognitive perspective has a significant impact on the way machine learning is performed nowadays, due to the possibility of model inference for various subtasks in machine learning. Our software also contributes to long-term prospective of explainable modelling and can be used in other early related fields, including speech recognition. Overall, this thesis bridges the gap between human intelligence and machine learning through the development of a framework for knowledge representation and the recognition of musical objects in a resonance spectrum.
- Automatic Music Transcription (AMT)
- Source Seperation
- Pitch Correction
- ...
The layout of the thesis is an expansion on the master's thesis of Gilles Castel. That, in turn, was inspired by the work of Edward Tufte. Feel free to use the latex code in this git repository as a template for your own content.