Automatically Extracting, Analyzing, and Visualizing Information on Music Artists from the World Wide Web

Markus Schedl
Johannes Kepler University, Linz, Austria (July, 2008)


In the context of this PhD thesis, methods for automatically extracting music-related information from the World Wide Web have been elaborated, implemented, and analyzed. Such information is becoming more and more important in times of digital music distribution via the Internet as users of online music stores nowadays expect to be offered additional music-related information beyond the pure digital music file. Novel techniques have been developed as well as existing ones refined in order to gather information about music artists and bands from the Web. These techniques are related to the research fields of music information retrieval, Web mining, and information visualization. More precisely, on sets of Web pages that are related to a music artist or band, Web content mining techniques are applied to address the following categories of information:
- similarities between music artists or bands
- prototypicality of an artist or a band for a genre
- descriptive properties of an artist or a band
- band members and instrumentation
- images of album cover artwork

Different approaches to retrieve the corresponding pieces of information for each of these categories have been elaborated and evaluated thoroughly on a considerable variety of music repositories. The results and main findings of these assessments are reported. Moreover, visualization methods and user interaction models for prototypical and similar artists as well as for descriptive terms have evolved from this work.

Based on the insights gained by the various experiments and evaluations conducted, the core application of this thesis, the "Automatically Generated Music Information System" (AGMIS) was build. AGMIS demonstrates the applicability of the elaborated techniques on a large collection of more than 600,000 artists by providing a Web-based user interface to access a database that has been populated automatically with the extracted information.

Although AGMIS does not always give perfectly accurate results, the automatic approaches to information retrieval have some advantages in comparison with those employed in existing music information systems, which are either based on labor-intensive information processing by music experts or on community knowledge that is vulnerable to distortion of information.

[BibTex, PDF, Return]