Event Formation and Separation in Musical Sound

David K. Mellinger
Stanford University, CA, USA (1991)


This thesis reviews psychoacoustic and neurophysiological studies that show how the human auditory system is capable of hearing out one source of sound from the mixture of sounds that reaches the ear. A number of cues are used for identifying which parts of the spectrum originate with a single source: common onset, the beginning of sound energy at different frequencies at one time; harmonicity, the arrangement of the partials of a tone into a harmonic series; common frequency variation, the motion of partials in frequency at the same relative rate; common spatial location; and several others.

A multistage architecture is described for early auditory processing. After the input sound signal is transduced into a map of neural firings in the cochlea, filters extract the various cues for source separation from the cochlear image. The model uses these cues to group local features into single sound events and further groups events over time into sound sources.

The implemented model groups parts of teh spectrum together over time to make separate sound events, using principles and constraints present in natural auditory systems. The model includes filters for detecting onsets and frequency variation in sound. These filters are tuned to work on musical sounds. Their output is used to find and separate notes in the signal, producing time-frequency images of the parts of the sound determined to belong to each event. This processing is applied to musical sounds made up of several notes played at a time, revealing the strengths and weaknesses of the computational model. The thesis offers directions for future work in computational auditory modelling.

[BibTex, External Link, Return]