Introduction

Team 98: HaYoung Kyung, Akira Sato, Tudor Cristea-Platon

CS109a Fall 2019
GitHub Repository

animated

Problem Description and Motivation

There are over 50 millions of songs on Spotify, making it difficult for users to discover and curate songs into playlists[1]. We often use playlists to set the mood for studying, exercising, partying, and relaxing. Having the right playlist to accompany our schedule has become a ritual for many millennials. Automatic playlist generators not only help us discover new songs, but also allow us to enjoy playlists with certain intent. In this project, we build a model to auto-generate playlists on Spotify. We first study ways to enrich our original data. We, then, use these features to build auto-playlist generator models that we have learned in class or outside. Finally, we compare the models' performance based on performance metrics formulated upon our model assumptions.

Data Source Description

The data set provided by CS109a staff contained the following: artist_name, track_uri, artist_uri, track_name, album_uri, duration_ms, album_name. Using the track_uri, we used Spotify's API[2] to obtain the following features associated with each track:

duration (int): The duration of the track in milliseconds.
danceability (float): Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
energy (float): Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
key (int): The estimated overall key of the track. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C#/D^b, 2 = D, and so on. If no key was detected, the value is -1.
loudness (float): The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
mode (int): Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
acousticness (float): A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
instrumentalness (float): Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
liveness (float): Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
valence (float): A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
tempo (float): The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
popularity (int): The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are. Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity. Note that the popularity value may lag actual popularity by a few days: the value is not updated in real time.
release_date (date time): The date of track release.

The descriptions and data types of each feature listed above can be found in the appendix. There are 999,992 playlists and 6,6346,428 songs in our data. In addition to these features, we perform preliminary feature engineering. We extract the days since each track has been released from today's date (2019-11-19). For instance, if a track was released on 2018-11-19, we can subtract today's date from the release date and obtain 365 days since release date. This allows us to compute the age of the track, e.g. 365 days, which is more insightful compared to the launch date, e.g. 2018-11-19, as it potentially allows us quantify the relevance of the track to today's users.

Research Questions

Our research mixes elements of Big Data and prediction. Some of the questions that we tried to answer is defining what makes two playlists similar, how do we suggest playlist to users based on their preferences, and given a current playlist how can we expand it using similar songs. One of our main challenges was formulating a good problem in an unsupervised learning setting and delivering an interpretable solution.

Related Work:

We have found that there have been student projects focused on predicting songs or playlist for Spotify. Machine learning classes at Stanford in 2018 featured final projects at Stanford in 2018 focused on predicting hit songs using logistic regression and shallow neural nets[3], as well as linear regression and recurrent neural nets[4]. At Harvard, a capstone project for AC297r in 2017 was designed around predicting the popularity of playlists using support vector machines and random forest classifiers, among other techniques[5]. Automatic playlist generation based on mood has also been an active area of research with some projects being as recent as October 2019, which employed PCA and k-means[6]. Hence our current project is of relevance and shows a sligthly different approach to the problem.

The following are some other works, grouped by areas of research.

Characteristics of Playlists Specified

In the early days of research on automatic playlist generation [7], the target characteristics of the playlist, including tempo, rhythm, and type of music, were usually specified. For example, Algoniemy et al. [7] introduced a network flow model to retrieve songs from a database based on use preferences.

Start and End Songs Specified

In Flexer et al. [8], users specified a start and an end track for each playlist, and the researchers proposed a way to automatically find middle songs to enable a smooth transition from start to end. Both objective and subjective evaluations have shown that this concept works well, but there are also problems, including that there is usually at least one song that is not appropriate in the middle songs.

Skipping Behavior

Pampalk et al. [9] focused on obtaining feedback from users, who were able to press a skip button. At the time of the study, the common way to automatically generate playlists was for the user to either shuffle songs in a library or manually select songs. However, the researchers interpreted it as negative feedback when users pressed the skip button, and they deleted the songs that were similar to those skipped. As a result, they succeeded in reducing the amount of skipping.

Audio-Content Analysis

Logan et al. [10] presented a content-similarity function that was based solely on their content. First, they divided each song into “frames” of 25 ms each. Then, they converted each frame using Mel-frequency cepstral coefficients (MFCCs). Next, they used K-means clustering to cluster the frames of each song into clusters that were similar. The sets of clusters were denoted as the “signature” of that song. Finally, the researchers compared two songs by calculating the Earth Mover’s distance (EMD) as a way to accommodate local clustering. However, one of the problems they encountered was that the model was not highly accurate. For example, the average number of the 20 closest songs that had the same genre as the seed song was about 12. In addition, the average number of similar songs in playlists generated using this model was 8.2, as rated by 2 users over 20 queries.

Sources:

[1] https://medium.com/@jessicafrech/this-is-how-you-get-added-to-spotifys-curated-playlists-7f01f2f6b891

[2] https://developer.spotify.com/documentation/web-api/reference/

[3] http://cs229.stanford.edu/proj2018/report/16.pdf

[4] https://cs230.stanford.edu/files_winter_2018/projects/6970963.pdf

[5] https://rawgit.com/omarabboud/spotifycapstone/master/index.html

[6] https://towardsdatascience.com/predicting-my-mood-using-my-spotify-data-2e898add122a

[7] Masoud Alghoniemy and Ahmed Tewfik. A Network Flow Model for Playlist Generation. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME). Tokyo, Japan. 2001.

[8] Arthur Flexer, Dominik Schnitzer, Martin Gassar, and Garhard Widmer. Playlist Generation Using Start and End Songs. In Proceedings of the International Society for Music Information Retrieval (ISMIR), pp. 173–178. Philadelphia, PA, USA. 2008.

[9] Elias Pampalk, Tim Pohle, and Garhard Widmer. Dynamic Playlist Generation Based on Skipping Behavior. In Proceedings of the International Society for Music Information Retrieval (ISMIR), pp. 634–637. Amsterdam, Netherlands. 2005.

[10] Beth Logan. A Content-Based Music Similarity Function (Report CRL 2001/02). Compaq Computer Corporation Cambridge Research Laboratory Technical Report Series. Cambridge, MA. June 2001.