MIR PhD Thesis: David Meredith (2006)

Computing Pitch Names in Tonal Music: A Comparative Analysis of Pitch Spelling Algorithms

David Meredith
University of Oxford, UK (April, 2006)

ABSTRACT

A pitch spelling algorithm predicts the pitch names (e.g., C#4, Bb5 etc.) of the notes in a passage of tonal music, when given the onset-time, MIDI note number and possibly the duration and voice of each note. A new algorithm, called ps13, was compared with the algorithms of Longuet-Higgins, Cambouropoulos, Temperley and Chew and Chen by running various versions of these algorithms on a ‘clean’, score-derived test corpus, C, containing 195972 notes, equally divided between eight classical and baroque composers. The standard deviation of the accuracies achieved by each algorithm over the eight composers was used to measure style dependence (SD). The best versions of the algorithms were tested for robustness to temporal deviations by running them on a ‘noisy’ version of the test corpus, denoted by C'.

A version of ps13 called PS13s1 was the most accurate of the algorithms tested, achieving note accuracies of 99.44% (SD = 0.45) on C and 99.41% (SD = 0.50) on C'. A real-time version of PS13s1 also out-performed the other real-time algorithms tested, achieving note accuracies of 99.19% (SD = 0.51) on C and 99.16% (SD = 0.53) on C'. PS13s1 was also as fast and easy to implement as any of the other algorithms.

New, optimised versions of Chew and Chen’s algorithm were the least dependent on style over C. The most accurate of these achieved note accuracies of 99.15% (SD = 0.42) on C and 99.12% (SD = 0.47) on C'. The line of fifths was found to work just as well as Chew’s (2000) “spiral array model” in these algorithms.

A new, optimised version of Cambouropoulos’s algorithm made 8% fewer errors over C than the most accurate of the versions described by Cambouropoulos himself. This algorithm achieved note accuracies of 99.15% (SD = 0.47) on C and 99.07% (SD = 0.53) on C'. A new implementation of the most accurate of the versions described by Cambouropoulos achieved note accuracies of 99.07% (SD = 0.46) on C and 99.13% (SD = 0.39) on C', making it the least dependent on style over C'. However, Cambouropoulos’s algorithms were among the slowest of those tested.

When Temperley and Sleator’s harmony and meter programs were used for pitch spelling, they were more affected by temporal deviations and tempo changes than any of the other algorithms tested. When enharmonic changes were ignored and the music was at a natural tempo, these programs achieved note accuracies of 99.27% (SD = 1.30) on C and 97.43% (SD = 1.69) on C'. A new implementation, called TPROne, of just the first preference rule in Temperley’s theory achieved note accuracies of 99.06% (SD = 0.63) on C and 99.16% (SD = 0.52) on C'. TPROne’s performance was independent of tempo and less dependent on style than that of the harmony and meter programs.

Of the several versions of Longuet-Higgins’s algorithm tested, the best was the original one, implemented in his music.p program. This algorithm achieved note accuracies of 98.21% (SD = 1.79) on C and 98.25% (SD = 1.71) on C', but only when the data was processed a voice at a time.

None of the attempts to take voice-leading into account in the algorithms considered in this study resulted in an increase in note accuracy and the most accurate algorithm, PS13s1, ignores voice-leading altogether. The line of fifths is used in most of the algorithms tested, including PS13s1. However, the superior accuracy achieved by PS13s1 suggests that pitch spelling accuracy can be optimised by modelling the local key as a pitch class frequency distribution instead of a point on the line of fifths, and by keeping pitch names close to the local tonic(s) on the line of fifths rather than close on the line of fifths to the pitch names of neighbouring notes.

[BibTex, Return]