The following notebook illustrates our k-NN collaborative filtering approach (uses as our baseline model) that uses track co-occurrence in playlists to recommend tracks to playlists.
In order to recommend relevant songs to playlists, it is natural to think about songs that are in a similar playlist but not currently in our target playlist.
Our k-NN collaborative filtering modeling approach looks at playlists that share similar tracks (i.e., if the two playlists have a high number of overlapping tracks) and recommend songs that are in the similar playlists but not in the target playlist.
To compute prediction set per playlist, our model does the following:,
The model finds k nearest neighbor playlists of our target playlist (measured by cosine distance between playlists).
From the k-NN playlists, the model then build the recommendation by recommending tracks that are in the neighbor playlists but not in the target playlist, starting from the most similar playlist. It will stop building the recommendation list until it reaches the pre-determined length (test set size * 15 for each playlist)
We did a stratified splitting of the data (by Playlistid) into training and test set by 80-20. Stratified splitting ensures that we have the right proportion of train and test for playlists of different lengths.
train.shape, test.shape
((1970, 28), (616, 28))
We used the training set to create a binary sparse matrix with 100 playlists and 1534 unique songs. Each row represents tracks that are in the playlist (1) or not (0). As you can imagine, it is a sparse matrix, because one playlist only has maximum of 350 songs in our dataset, while we have 1534 unqiue songs in the matrix.
We then transformed the matrix to a Compressed Sparse Row matrix from scipy and fit a 5-nearest-neighbor model (using cosine as distance metric and brute-force search).
co_mat.shape
(100, 1534)
Below shows the few rows of the matrix.
co_mat.head()
Track_uri | spotify:track:00LfFm08VWeZwB0Zlm24AT | spotify:track:00qOE7OjRl0BpYiCiweZB2 | spotify:track:01a0J96fRD91VnjQQUCqMK | spotify:track:01iyCAUm8EvOFqVWYJ3dVX | spotify:track:027h5P3kCyktHv9dpHUBBS | spotify:track:02M6vucOvmRfMxTXDUwRXu | spotify:track:03L2AoiRbWhvt7BDMx1jUB | spotify:track:03LpkqucyYKcYclDs8HuxO | spotify:track:03fT3OHB9KyMtGMt2zwqCT | spotify:track:03tqyYWC9Um2ZqU0ZN849H | ... | spotify:track:7yFMhCJOsH7khgpdnyrZAZ | spotify:track:7yHEDfrJNd0zWOfXwydNH0 | spotify:track:7ySUcLPVX7KudhnmNcgY2D | spotify:track:7yfg0Eer6UZZt5tZ1XdsWz | spotify:track:7yq4Qj7cqayVTp3FF9CWbm | spotify:track:7yyRTcZmCiyzzJlNzGC9Ol | spotify:track:7zBPzAjKAqQpcv8F8GCq5s | spotify:track:7zWj09xkFgA9tcV6YhfU6q | spotify:track:7zbq8RT5Kd3ExOGVTiUQbR | spotify:track:7zxRMhXxJMQCeDDg0rKAVo |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Playlistid | |||||||||||||||||||||
430 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
622 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1990 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2259 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
2535 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 1534 columns
We used the following metrics to evaluate our model, based on Spotify RecSys rules
Data | R-Precision | NDCG | Average of the two metrics |
---|---|---|---|
Test set of 100 playlists | 0.077025 | 0.080346 | 0.078685 |
We achieved a 0.07 R-precision score and 0.08 NDCG score with our baseline k-NN collaborative filtering model. The current model only consider track co-occurence between playlists, we are curious to know if the model will improve if audio features are added.
https://github.com/phoebewong/spotify-teamNPK/blob/master/src