On the promises of AI and listening data for music research

NIKITA BRAGUINSKI

As a c:o/re fellow, I had the uniquely advantageous opportunity to develop and test, in an environment dedicated to the study of science, my ideas about how AI and data can influence music research. Members of the Kolleg and its fellows, many of whom are philosophers of science, offered a very rich intellectual circle that inspired me to look at the datafication and technologization of future music research from many new angles. With its intensive and diverse program of talks, lectures, and conferences, the Kolleg also offered ideal opportunities for testing approaches in front of an attentive, thoughtful, critical and friendly audience. Below, I present brief overviews of the main ideas that I discussed during three talks I gave at the Kolleg.

Nikita Braguinski

Nikita Braguinski studies the implications of technology for musicology and music. In his current work, he aims to discuss challenges posed to human musical theory by recent advances in machine learning.

My first presentation, entitled “The Shifting Boundaries of Music-Related Research: Listening Logs, Non-Human-Readable Data, and AI”, took place on January 16, 2024 during an internal meeting of Kolleg fellows and members. I focused on the promises and problems of using data about music streaming behavior for musical research. Starting from the discussion of how changing technologies of sound reproduction enabled differing degrees of observing listener behavior, I discussed the current split between academic and industrial music research, the availability of data, the problems of current industry-provided metrics such as “danceability”, and the special opportunities offered by existing and future multimodal machine learning (like the systems that use the same internal encoding for both music and text). I also offered examples of descriptive statistics and visualizations made possible by the availability of data on listener behavior. These visualizations of large listening datasets, which I was able to create thanks to my access to the RWTH high performance computing cluster, included, among others, an illustration of how users of online streaming services tend to listen to new recordings on the day of their release, and an analysis of the likeliness of different age groups to listen to popular music from different decades (with users from the age group 60-69 having almost the opposite musical preferences of the age group 10-19).

Fig. 1: Users of online streaming services often listen to new recordings on the day of their release
(Own diagram. Vertical axis: number of plays. Dataset: LFM-2b, German audience)

Discussing my talk, c:o/re colleagues drew parallels to other academic disciplines such as digital sociology and research on pharmaceutical companies. The topic of addictiveness of online media that I touched upon was discussed in comparison to data-gathering practices in gambling, including the ethics of using such data for research. The political significance of music listening and its connection to emotions was also discussed in relation to the danger of biases in music recommender systems.

My second presentation, entitled “Imitations of Human Musical Creativity: Process or Product?”, took place during the conference “Politics of the Machines 2024. Lifelikeness and Beyond”, which c:o/re hosted. I focused on the question of what AI-based imitations of music actually model – the final product (such as the notation or the audio recording) or the processes that lead to the creation of this product.

In this presentation, I discussed:

1) The distinction between process and product of artistic creation, which, while especially important for discussions on the output of generative AI, currently receives little scholarly attention;

2) How several theories in the humanities (notably, formalism, psychoanalytic literary theory, and the line of AI skepticism connected to the so-called Chinese room argument) stress the importance of the process in artistic creation and cognition;

3) That current endeavors in generative AI, though impressive from the point of view of the product, do not attempt to imitate the processes of creation, dissemination, and reception of art, literature, or music, nor do they imitate historical, cultural, or economic environments in which these processes take place;

4) Finally, because the data on which generative AI systems operate carries traces of past processes, the product of these systems remains connected to the processes, even if no conscious effort is made by the creators of these systems to imitate the processes themselves.

Fig. 2: An image of the Textile Cone, a sea snail with a striking pattern on its shell. I used this picture to illustrate how a full process-based imitation of the shell’s pattern would need to include imitation of all the snail’s life processes, as well as of its living environment. (Image: “Conus textile 7” by Harry Rose. https://www.flickr.com/photos/macleaygrassman/9271210509. CC-BY: https://creativecommons.org/licenses/by/2.0/)

A conference participant commented that for commercial companies avoiding the imitation of all these processes is a deliberate strategy because their imitation has to be cheaper than the original process-based artifact.

My third presentation at the Kolleg, “Life-Like Artificial Music: Understanding the Impact of AI on Musical Thinking”, took place on June 5, 2024 as a lecture in the c:o/re Lifelikeness lecture series. Here, I addressed the likeliness (or unlikeliness) of major shifts in the musicological terminology to result from the academic use of AI . Starting with an overview of various competing paradigms of musical research, I drew attention to possible upcoming problems of justifying the validity of currently existing musicological terminology. The salient point here is that AI systems based on machine learning are capable of imitating historical musical styles without recourse to explicitly stated rules of musical theory, while humans need the rules to learn to imitate those styles. Moreover, the ability of machine learning systems to learn internal structures of music directly from audio (skipping the notation stage on which most of human music theory operates) has the potential to question the validity and usefulness of musical theory, as currently taught.

Having stated these potential problems, I turned to a current example, a research paper [1] in which notions of Western music theory were compared to the internal representations learned by an AI system from music examples. Using this paper as a starting point for my argument, I asked whether it could be possible in principle to also use such an approach to come up with new, maybe better, musicological terminology. I pointed to the problems of interpreting the structures learned by machine learning systems and of the likely incompatibility of such structures (even if successfully decoded) with the human cognitive apparatus. To illustrate this, I referred to the use, by beginner players of the game of Go, of moves made by AI systems. Casual players are normally discouraged from copying the moves of professional human players because they cannot fully understand these moves’ underlying logic and thus cannot effectively integrate them into their strategy.

In the following discussion, one participant drew attention to the fact that new technologies often lead to a change in what is seen as a valid research contribution, devaluing older types of research outcomes and creating new ones. Another participant argued that a constant process of terminological change takes place in disciplines at all times and independently of a possible influence of a new technology, such as machine learning.

Overall, my c:o/re fellowship offered, and continues to offer, an ideal opportunity to develop and discuss new ideas for my inquiry into the future uses and problems of AI and data in music research, which have resulted, in addition to the three presentations mentioned above, in talks given at the University of Bonn, Maastricht University, and at a music and AI conference at the University of Hong Kong.

[1] N. Cosme-Clifford, J. Symons, K. Kapoor and C. W. White, “Musicological Interpretability in Generative Transformers,” 4th International Symposium on the Internet of Sounds, Pisa, Italy, 2023

Category: c:o/re-Blog