This dissertation is centered on the digital processing of the singing voice, more concretely on the analysis, transformation and synthesis of this type of voice in the spectral domain, with special emphasis on those techniques relevant for music applications.|
The digital signal processing of the singing voice became a research topic itself since the middle of last century, when first synthetic singing performances were generated taking advantage of the research that was being carried out in the speech processing field. Even though both topics overlap in some areas, they present significant differentiations because of (a) the special characteristics of the sound source they deal and (b) because of the applications that can be built around them. More concretely, while speech research concentrates mainly on recognition and synthesis; singing voice research, probably due to the consolidation of a forceful music industry, focuses on experimentation and transformation; developing countless tools that along years have assisted and inspired most popular singers, musicians and producers. The compilation and description of the existing tools and the algorithms behind them are the starting point of this thesis.
The first half of the thesis compiles the most significant research on digital processing of the singing voice based on spectral domain, proposes a new taxonomy for grouping them into categories, and gives specific details for those in which the author has mostly contributed to; namely the sinusoidal plus residual model Spectral Modelling Synthesis (SMS), the phase locked vocoder variation Spectral Peak Processing (SPP), the Excitation plus Residual (EpR) spectral model of the voice, and a sample concatenation based model. The second half of the work presents new formulations and procedures for both describing and transforming those attributes of the singing voice that can be regarded as voice specific. This part of the thesis includes, among others, algorithms for rough and growl analysis and transformation, breathiness estimation and emulation, pitch detection and modification, nasality identification, voice to melody conversion, voice beat onset detection, singing voice morphing, and voice to instrument transformation; being some of them exemplified with concrete applications.