-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMP_Kuzmovych_Yevhen_2019.tex
213 lines (163 loc) · 8.38 KB
/
MP_Kuzmovych_Yevhen_2019.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
\documentclass[thesis=M,english,hidelinks]{FITthesis}[2012/10/20]
\usepackage[utf8]{inputenc} % LaTeX source encoded as UTF-8
\usepackage{graphicx} % graphics files inclusion
\usepackage{dirtree} % directory tree visualisation
\usepackage[printonlyused]{acronym} % acronims
\usepackage{float} % floats
\usepackage[table,xcdraw]{xcolor}
\usepackage{multirow,array}
\usepackage[]{algorithm2e} % for pseudocodes
\usepackage{amsmath} % for argmax, argmin
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\usepackage{amssymb}
\usepackage[capitalise]{cleveref} % to use \cref wich includes words section or figure in reference
\usepackage{pdfpages}
\usepackage{wasysym} % in-text notes
\usepackage{harmony} % in-text notes
\usepackage{musixtex} % in-text notes
%\usepackage{indentfirst} % indent in first paragraph
\raggedbottom
%\usepackage[all]{hypcap} % for going to the top of an image when a figure reference is clicked
\usepackage[numbers]{natbib}
\usepackage{listings} % for the source code
\lstset{language=Python, frame=tb, basicstyle=\small} %or \small or \footnotesize etc.}
\graphicspath{{img/}} % images folder
\department{Department of Applied Mathematics}
\title{Multi-instrument music transcription}
\authorGN{Yevhen} %author's given name/names
\authorFN{Kuzmovych} %author's surname
\author{Yevhen Kuzmovych} %author's name without academic degrees
\authorWithDegrees{Bc. Yevhen Kuzmovych} %author's name with academic degrees
\supervisor{Ing. Marek Šmíd, Ph.D.}
\acknowledgements{I would like to thank the supervisor of the work, Ing. Marek Šmíd, Ph.D.\ for the help in writing of
this thesis, valuable advice and suggestions for improvement, thorough reviews, and extensive knowledge sharing in
the fields of signal processing and ML.
I want to express my special gratitude to my family, especially my parents, for their great support and help throughout
the whole study and writing of this work\footnote{\foreignlanguage{russian}{Я бы хотел выразить особую благодарность
моей семье, особенно моим родителям, за их большую поддержку и помощь на протяжении всего обучения и написания этой
работы.}}.
}
\abstractCS{
Přepis hudby hraje důležitou roli v oblasti získávání hudebních informací, a je to často složitý úkol i pro lidského
posluchače. Automatizace tohoto přepisu se těší velkému zájmu z řad studentů hudby, kteří jej mohou využít pro nácvik
rozpoznávání not a učení nových skladeb. Také hudební producenti a DJ, kteří by mohli využít funkce detekce stupnice a
oddělení zdrojových nástrojů, by z něj mohli těžit. Podobně streamingové platformy ho mohou používat pro své
doporučovací systémy, atp.
Tato práce zkoumá nejmodernější řešení, která kombinují analýzu signálů s přístupy strojového učení s učitelem pro
uvedené problémy. Navrhuje imple\-mentaci, která je využita k provádění úlohy transkripce hudby s více nástroji, která
má na vstupu surový zvukový soubor a produkuje sadu notových listů pro každý použitý nástroj na výstupu. Řešení je
implementováno v programovacím jazyce Python jako modul a současně jako aplikace s rozhraním pro příkazovou řádku.
Implementace je rozdělena do logických modulů zodpovědných za odhad odpoví\-dajících částí výstupní partitury, jmenovitě
oddě\-lení zdrojů, detekce výšek a událostí, odhad tempa, stupnice a taktového předznamenání. Takové oddělení umožňuje
jednoduché rozšiřování a testování. Přesnost každého modulu je vyhodnocená na příslušných datových sadách.
}
\abstractEN{
Music transcription is important though complex, often even for a human listener, task in the field of music information
retrieval. Automation of such task is in a big demand among musical students, who can use it for practicing of note
recognition and learning of the new pieces; music producers and DJs, who could utilize its key detection and source
instrument separation functionality; streaming platforms that may use it for their recommendation systems; etc.
This thesis explores state-of-the-art solutions that combine signal analysis and supervised machine learning
approaches for the mentioned problems. It proposes implementation that utilizes them to perform a multiple-instrument
music transcription task having a raw audio file on the input and producing a set of sheet music scores for each played
instrument on the output. The solution is implemented in the Python programming language as a module as well as
a command-line interface application. The implementation is separated into logical modules responsible for estimation of
the corresponding parts of the output score, namely source separation, pitch and event detection, tempo, key and time
signature estimation. Such separation allows for simple expansion and testing. The performance of each module is
evaluated on appropriate datasets.
}
\keywordsCS{Přepis hudby, zpracování zvuku, analýza signálu, učení s učitelem.}
\keywordsEN{Music transcription, audio processing, signal analysis, supervised learning.}
\placeForDeclarationOfAuthenticity{Prague}
\declarationOfAuthenticityOption{1} %select as appropriate, according to the desired license (integer 1-6)
% \website{http://site.example/thesis} %optional thesis URL
\begin{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Custom commands %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\img}[3][1]{
\centerline{\includegraphics[scale=#1]{#2.#3}}
}
\newcommand{\textwideimg}[3][1]{
\centerline{\includegraphics[width=#1\textwidth]{#2.#3}}
}
% Usage:
% \image[size]{diagram and lable name}{extention}{caption}
% \image[1.3]{component_diagram}{pdf}{Component diagram}
\newcommand{\image}[4][1]{
\begin{figure}[H]
\textwideimg[#1]{#2}{#3}
\caption{#4}
\label{fig:#2}
\end{figure}
}
\newcommand\figcenter[1]{
\begin{figure}[H]
\centering
#1
\end{figure}
}
\newcommand\inlinemusic[1]{\begin{music}#1\end{music}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\setsecnumdepth{part}
\input{tex/0_Introduction}
\setsecnumdepth{all}
\input{tex/1_State-of-the-art}
\input{tex/2_Analysis-and-design}
\input{tex/3_Implementation.tex}
\input{tex/4_Testing.tex}
\setsecnumdepth{part}
\input{tex/Conclusion}
\bibliographystyle{iso690}
\bibliography{bibliography}
\setsecnumdepth{all}
\appendix
\chapter{Acronyms}\label{ch:acronyms}
\begin{acronym}
\acro{WAVE}{Waveform Audio File Format}
\acro{LPCM}{Linear pulse-code modulation}
\acro{FFT}{Fast Fourier transform}
\acro{DFT}{Discrete Fourier transform}
\acro{IDFT}{Inverse Discrete Fourier transform}
\acro{CNN}{Convolutional Neural Network}
\acro{SiSeC}{Source Separation campaign}
\acro{AMT}{Automatic music transcription}
\acro{ML}{Machine learning}
\acro{MIREX}{Music Information Retrieval Evaluation eXchange}
\acro{HPS}{Harmonic partial sequence}
\acro{MAP}{Maximum a posteriori}
\acro{EM}{Expectation-maximisation}
\acro{HTC}{harmonic temporal structured clustering}
\acro{NMF}{Non-negative matrix factorisation}
\acro{SAGE}{Space alternating generalised \ac{EM} algorithm}
\acro{HMM}{Hidden Markov model}
\acrodefplural{HMM}{Hidden Markov models}
\acro{DBN}{dynamic Bayesian network}
\acrodefplural{DBN}{dynamic Bayesian networks}
\acro{ADSR}{attack, decay, sustain, and release}
\acro{MIDI}{Musical Instrument Digital Interface}
\acro{BPM}{beats per minute}
\acro{ARIMA}{Autoregressive integrated moving average}
\acro{PDF}{Portable Document Format}
\acro{CLI}{command line interface}
\acro{DNN}{deep neural network}
\acrodefplural{DNN}{deep neural networks}
\acro{SDR}{Source to Distortion Ratio}
\acro{SIR}{Source to Interferences Ratio}
\acro{SNR}{Sources to Noise Ratio}
\acro{SAR}{Sources to Artifacts Ratio}
\acro{FMA}{Free Music Archive}
\acro{MIR}{Music Information Retrieval}
\end{acronym}
\input{tex/Music-notation.tex}
\chapter{Contents of enclosed CD}\label{ch:contents-of-enclosed-cd}
%change appropriately
\begin{figure}
\dirtree{%
.1 readme.txt\DTcomment{the file with CD contents description}.
.1 src\DTcomment{the directory of source codes}.
.2 mimt\DTcomment{implementation sources}.
.2 thesis\DTcomment{the directory of \LaTeX{} source codes of the thesis}.
.1 text\DTcomment{the thesis text directory}.
.2 thesis.pdf\DTcomment{the thesis text in PDF format}.
}
\end{figure}
\end{document}