String Matching Techniques for Music Retrieval

Kjell Lemström
University of Helsinki, Finland (November, 2000)


In content-based music information retrieval (MIR), the primary task is to find exact or approximate occurrences of a monophonic musical query pattern within a music database, which may contain either monophonic or polyphonic documents. In tonal music, the occurrences that are searched for may be presented in any musical key, making transposition invariance a useful property for an MIR algorithm. Moreover, as the term 'content-based' indicates, a query should be comprised solely of musical information. It cannot contain, e.g., annotations such as keywords or lyrics of a musical document.

In this thesis, various aspects of music comparison and retrieval, including both theoretical and practical, will be studied. The general pattern matching scheme based on approximate string matching framework will be applied, and estimats on the theoretical complexity of such pattern matching will be given. Moreover, the aforementioned framework is to be tuned so that it would be possible to retrieve symbolically encoded musical documents efficiently. This modified framework is plausible as regards to findings in musicology. It enables the consideration of context in music and gives novel transposition invariant distance measures, avoiding some shortcomings of the straightforward applications of conventional string matching measures.

A database containing only monophonic music represents a special case, for which the applied techniques can be implemented more efficiently. Nevertheless, in this thesis, both the cases of monophonic and polyphonic music databases will be considered and solutions applying bit-parallelism, a very efficient way to implement dynamic programming, will be presented. In particular, the thesis introduces our novel, very efficient algorithm for polyphonic databases.

Further, the thesis gives two novel music alphabet reduction schemes. They both have strong musical basis, and they are used as essential components in our software system prototype called SEMEX. Finally, the thesis introduces SEMEX and describes how the described techniques and observations are implemented.

[BibTex, External Link, Return]