Listening to music and perceiving its structure is a fairly easy task for humans, even for listeners without formal musical training. For example, we can notice changes of notes, chords and keys, though we might not be able to name them (segmentation based on tonality and harmonic analysis); we can parse a musical piece into phrases or sections (segmentation based on recurrent structural analysis); we can identify and memorize the main themes or the catchiest parts – hooks - of a piece (summarization based on hook analysis); we can detect the most informative musical parts for making certain judgments (detection of salience for classification). However, building computational models to mimic these processes is a hard problem. Furthermore, the amount of digital music that has been generated and stored has already become unfathomable. How to efficiently store and retrieve the digital content is an important real-world problem.|
This dissertation presents our research on automatic music segmentation, summarization and classification using a framework combining music cognition, machine learning and signal processing. It will inquire scientifically into the nature of human perception of music, and offer a practical solution to difficult problems of machine intelligence for automatic musical content analysis and pattern discovery.
Specifically, for segmentation, an HMM-based approach will be used for key change and chord change detection; and a method for detecting the self-similarity property using approximate pattern matching will be presented for recurrent structural analysis. For summarization, we will investigate the locations where the catchiest parts of a musical piece normally appear and develop strategies for automatically generating music thumbnails based on this analysis. For musical salience detection, we will examine methods for weighting the importance of musical segments based on the confidence of classification. Two classification techniques and their definitions of confidence will be explored. The effectiveness of all our methods will be demonstrated by quantitative evaluations and/or human experiments on complex real-world musical stimuli.