Music transcription refers to extraction of a human readable and interpretable description from a recording of a music performance. The final goal is to implement a program that can automatically infer a musical notation that lists the pitch levels of notes and corresponding score positions in any arbitrary acoustical input. However, in this full generality, music transcription stays yet as a hard problem and arguably requires simulation of a human level intelligence. On the other hand, under some realistic assumptions, a practical engineering solution is possible by an interplay of scientific knowledge from cognitive science, musicology, musical acoustics and computational techniques from artificial intelligence, machine learning and digital signal processing. In this context, the aim of this thesis is to integrate this vast amount of prior knowledge in a consistent and transparent computational framework and to demonstrate the feasibility of such an approach in moving us closer to a practical solution to music transcription.|
In this thesis, we approach music transcription as a statistical inference problem where given a signal, we search for a score that is consistent with the encoded music. In this context, we identify three subproblems: Rhythm Quantization, Tempo Tracking and Polyphonic Pitch Tracking. For each subproblem, we define a probabilistic generative model, that relates the observables (i.e. onsets or audio signal) with the underlying score. Conceptually, the transcription task is then to ``invert'' this generative model by using the Bayes Theorem and to estimate the most likely score.