Build a Song Recommender System using Content-Based Filtering in Python.
With the rapid growth in online and mobile platforms, lots of music platforms are coming into the picture. These platforms are offering songs lists from across the globe. Every individual has a unique taste for music. Most people are using Online music streaming platforms such as Spotify, Apple Music, Google Play, or Pandora.
Online Music listeners have lots of choices for the song. These customers sometimes get very difficult in selecting the songs or browsing the long list. The service providers need an efficient and accurate recommender system for suggesting relevant songs. As data scientists, we need to understand the patterns in music listening habits and predict the accurate and most relevant recommendations.
In this tutorial, we are going to cover the following topics:
The content-based filtering method is based on the analysis of item features. It determines which features are most important for suggesting the songs. For example, if the user has liked a song in the past and the feature of that song is the theme and that theme is party songs then Recommender System will recommend the songs based on the same theme. So the system adapts and learns the user behavior and suggests the items based on that behavior. In this article, we are using the Spotify dataset to discover similar songs for recommendation using cosine similarity and sigmoid kernel.
In this tutorial, you will build a book recommender system. You can download this dataset from here.
Let’s load the data into pandas dataframe:
import pandas as pd
from sklearn.metrics.pairwise import sigmoid_kernel
from sklearn.metrics.pairwise import cosine_similarity
from sklearn import preprocessing
df=pd.read_csv("data.csv")
df.head(
Output:
Let’s understand the dataset. In this dataset, we have 15 columns: acousticness, danceability, duration_ms, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, time_signature, valence, target, song_title, artist.
df.info()
Output: <class 'pandas.core.frame.DataFrame'> RangeIndex: 2017 entries, 0 to 2016 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 2017 non-null int64 1 acousticness 2017 non-null float64 2 danceability 2017 non-null float64 3 duration_ms 2017 non-null int64 4 energy 2017 non-null float64 5 instrumentalness 2017 non-null float64 6 key 2017 non-null int64 7 liveness 2017 non-null float64 8 loudness 2017 non-null float64 9 mode 2017 non-null int64 10 speechiness 2017 non-null float64 11 tempo 2017 non-null float64 12 time_signature 2017 non-null float64 13 valence 2017 non-null float64 14 target 2017 non-null int64 15 song_title 2017 non-null object 16 artist 2017 non-null object dtypes: float64(10), int64(5), object(2) memory usage: 268.0+ KB
Before building the model, first we normalize or scale the dataset. For scaling it we are using MinMaxScaler of Scikit-learn library.
feature_cols=['acousticness', 'danceability', 'duration_ms', 'energy',
'instrumentalness', 'key', 'liveness', 'loudness', 'mode',
'speechiness', 'tempo', 'time_signature', 'valence',]
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
normalized_df =scaler.fit_transform(df[feature_cols])
print(normalized_df[:2])
In this section, we are building a content-based recommender system using similarity measures such as Cosine and Sigmoid Kernel. Here, we will find the similarities among items or songs feature set and pick the top 10 most similar songs and recommend them.
Cosine similarity measures the cosine angle between two feature vectors. Its value implies that how two records are related to each other. Cosine similarity can be computed for the non-equal size of text documents.
# Create a pandas series with song titles as indices and indices as series values
indices = pd.Series(df.index, index=df['song_title']).drop_duplicates()
# Create cosine similarity matrix based on given matrix
cosine = cosine_similarity(normalized_df)
def generate_recommendation(song_title, model_type=cosine ):
"""
Purpose: Function for song recommendations
Inputs: song title and type of similarity model
Output: Pandas series of recommended songs
"""
# Get song indices
index=indices[song_title]
# Get list of songs for given songs
score=list(enumerate(model_type[indices['Parallel Lines']]))
# Sort the most similar songs
similarity_score = sorted(score,key = lambda x:x[1],reverse = True)
# Select the top-10 recommend songs
similarity_score = similarity_score[1:11]
top_songs_index = [i[0] for i in similarity_score]
# Top 10 recommende songs
top_songs=df['song_title'].iloc[top_songs_index]
return top_songs
In the above code, we have computed the similarity using Cosine similarity and returned the Top-10 recommended songs.
Let’s make a forecast using computed cosine similarity on the Spotify song dataset.
print("Recommended Songs:")
print(generate_recommendation('Parallel Lines',cosine).values)
In the above code, we have generated the Top-10 song list based on cosine similarity.
Let’s make a forecast using computed Sigmoid kernel on Spotify song dataset.
# Create sigmoid kernel matrix based on given matrix
sig_kernel = sigmoid_kernel(normalized_df)
print("Recommended Songs:")
print(generate_recommendation('Parallel Lines',sig_kernel).values)
In the above code, we have generated the Top-10 song list based on Sigmoid Kernel.
Congratulations, you have made it to the end of this tutorial!
In this tutorial, we have built the song recommender system using cosine similarity and Sigmoid kernel. This developed recommender system is a content-based recommender system. In another article, we have developed the recommender system using collaborative filtering. You can check that article here Book Recommender System using KNN. You can also check another article on the NLP-based recommender system.
In this tutorial, we will focus on MapReduce Algorithm, its working, example, Word Count Problem,…
Learn how to use Pyomo Packare to solve linear programming problems. In recent years, with…
In today's rapidly evolving technological landscape, machine learning has emerged as a transformative discipline, revolutionizing…
Analyze employee churn, Why employees are leaving the company, and How to predict, who will…
Airflow operators are core components of any workflow defined in airflow. The operator represents a…
Machine Learning Operations (MLOps) is a multi-disciplinary field that combines machine learning and software development…