Discussion

Reflections
Model Comparisons
Future Work

Reflections

Overall, the experience of coding a playlist recommender was extremely rewarding. The problem is unique for a few reasons. First and foremost, the dataset is very atypical. As the data came from multiple sources, we had a mix of datatypes coming from all sorts of distributions that were defined on very different domains. This made content-based filtering a lot harder, as we had to make sure we selected appropriate meta data and scaled the features carefully. A second reason this project was challenging was due to the non-traditional response variable. Fot data that comes with labels, it easy to split up into training and validation sets. For the spotify data, we not only had to make sure we stratified the samples on playlist id, we also had a non-traditional response variable in the form of a hold-out set. This led us to use metrics that catered specifically to this type of recommendation problem. The real growth and knowledge came from working to implement the following models, so without further ado we shall comment on our experiences with each of them.

Model Comparisons

Unfortunately, due to time and computer processing limitations, we were not able to do a fair comparison between models on a dataset larger than 100 playlists. However, we did test individual models on larger datasets, up to tens of thousands of playslists.

kNN Collaborative Filtering vs. Content-Based Filtering

Our expectations for kNN collaborative filtering were low. We thought that we would need more nuanced information about a playlist, rather than some measure of co-occurrence. We were proven wrong, however, because the performance of the kNN filter rivaled that of the content-based filtering. We had several ideas as to why this might be the case. Perhaps the meta data for each of the songs is highly dependent, which would reduce the amount of information that a vector of metadata could possibly contain. Another possibility is that the metadata is actually extremely diverse, and this diversity makes it difficult to compare vectors. The collaborative filtering would have an advantage in this case, because it doesn’t consider any specific features from the playlist, only how often songs appear with one another. In any case, we feel that these two types of filtering can be combined to maximize the assets that the other lacks.

ALS Matrix Factorization

ALS Matrix Factorization outperforms other models far in R-precision. We are surprised that it has similar NDCG score with other model approaches we tried, e.g., k-means clustering. NDCG score does not perform as good as we expected, because we expect ALC model to rank the similarity in the suggestion set better because ALC considers similarity and it suggests songs with higher similarity. One of the reasons we suspect is because some of the items have very high similarity scores (e.g., score > 0.999) and therefore, the ranking of the suggestion might be less sensitive. We did not have enough time to cross-validate different lambda to regularize and different number of iterations, but we might get better result if we have more time to tune the model.

k-Means Clustering

We were not sure what to expect from the k-Means clustering. While on the one hand we thought it would be a great way to capture higher dimensional similarities across playlists (similar to content-based filtering), there was also the possibility that the most popular tracks would get clustered together, and we’d end up washing out other types of music. Further analysis would need to be done to examine whether this happened, but we were pleased to see that the k-Means clustering outperformed the filtering algorithms, indicating it is even more effective than content-based filtering. This makes sense, as the clustering algorithm iteratively computes similarities as opposed to a simple, single operation that is performed in cosine similarity computations.

Future Work

One model that we didn’t have the chance to implement, was a 2 stage hierarchical model. In the first stage, the model would use one of the filtering techniques described above. At the second stage, we would engineer features and run a gradient boosting algorithm to re-rank the candidate tracks. Additional future work could be done in decreasing the computational complexity of the k-means clustering algorithm, potentially using parallel computing and other multi-threading procedures to allow us to extend our models to the entirety of the million playlist.