Perform sentiment analysis on restaurant review dataset.
The online presence of restaurants gives them chance to reach more and more customers. Nowadays restaurants want to understand people’s opinions about their food and service. It helps them to know what customers are thinking about their restaurant? Restaurants can quantify the customer opinion and analyze the data you can quantify such information with good accuracy using sentiment analysis.
Traditionally, people take suggestions from acquaintances, friends, and family. But in the 21st-century internet has become one of the most important platforms for seeking advice. On these online platforms, Customer can also share their feedback and view other people’s feedback about restaurant service, food, taste, ambiance, and price. These reviews help customers to make the decision for choosing the restaurant and make a trust because it is based on mouth publicity.
In this tutorial, you will analyze the text data and perform sentiment analysis using Text classification. This is how you learn sentiment and text classification with a single example.
In this tutorial, you will perform text analytics and sentiment analysis on restaurant reviews using Text Classification.
Here, you can use the “Restaurants Reviews” dataset available on Kaggle. you can download it from the following link: The dataset is a tab-separated file. Dataset has two columns Review and Liked.
This data has 2 sentiment labels:
0 — Not liked 1 — Liked
Let’s load the data into pandas dataframe:
import pandas as pd
data=pd.read_csv('Restaurant_Reviews.tsv',sep='\t')
data.head()Output:
print(data.Liked.value_counts())Output:
1 500 0 500 Name: Liked, dtype: int64
In this dataset, we have two columns Review and Liked. Review is the restaurant review and Liked is the sentiment 0 and 1. 1 means the customer has liked the restaurant and 0 means not liked.
Wordcloud is the pictorial representation of the most frequent keywords in the dataset. In a wordcloud, the font size of words is directly proportional to the frequency of the word. In order to create a wordcloud we have to first install the module wordcloud using  pip command.
!pip install wordcloud Let’s create a wordcloud using wordcloud module:
# Import WordCloud and STOPWORDS
from wordcloud import WordCloud
from wordcloud import STOPWORDS
# Import matplotlib
import matplotlib.pyplot as plt 
def word_cloud(text):
    # Create stopword list
    stopword_list = set(STOPWORDS) 
    # Create WordCloud 
    word_cloud = WordCloud(width = 550, height = 550, 
                           background_color ='white', 
                           stopwords = stopword_list, 
                           min_font_size = 12).generate(text) 
    # Set wordcloud figure size
    plt.figure(figsize = (8, 6)) 
    # Show image
    plt.imshow(word_cloud) 
    # Remove Axis
    plt.axis("off")  
    # show plot
    plt.show()
    
paragraph=' '.join(data.Review.tolist())
word_cloud(paragraph)Output:
In the above word cloud, we can see that the most frequent words are food, service, place, good, great, time, restaurant, and friendly words are used.
Let’s create a word cloud for positive restaurant reviews:
paragraph=' '.join(data[data.Liked==1].Review.tolist())
word_cloud(paragraph)Output:
In this above word cloud, we are showing the frequent keywords of positive reviews. we can see that the most frequent words such as good, great, service, place, friendly, delicious, nice, and amazing.
paragraph=' '.join(data[data.Liked==0].Review.tolist())
word_cloud(paragraph)Output:
In the above word cloud, we are showing the frequent keywords of negative reviews. we can see that the most frequent words such as food, place, time, food service, bad, worst, slow, minutes, and disappointed.
In-Text Classification Problem, we have to generate features from the textual dataset. because we directly can’t apply the machine learning models to text data. We need to convert these text into some numbers or vectors of numbers. We are generating a Bag-of-words(BoW) for extracting features from the textual dataset. BoW converts text into the matrix of the frequency of words within a document.
# Bag of word: vectors word frequency(count)
from sklearn.feature_extraction.text import CountVectorizer
from nltk.tokenize import RegexpTokenizer
#tokenizer to remove unwanted elements from data like symbols
token = RegexpTokenizer(r'[a-zA-Z0-9]+')
cv = CountVectorizer(lowercase=True,
                     stop_words='english',
                     ngram_range = (1,1),
                     tokenizer = token.tokenize)
text_counts= cv.fit_transform(data['Review'])
print(text_counts.shape)Output: (1000, 1834)
Convert CountVectorized vector into Pandas DataFrame.
count_df = pd.DataFrame(text_counts.toarray(),columns=cv.get_feature_names())
count_df.head()Output:
In order to assess the model performance, we need to split the dataset into a training set and a test set.
Let’s split dataset by using function train_test_split(). you need to pass basically 3 parameters features, target, and test_set size. Additionally, you can use random_state to select records randomly. Here, random_state simply sets seed to the random generator, so that your train-test splits are always deterministic. If you don’t set seed, it is different each time.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(text_counts, 
                                                    data['Liked'], 
                                                    test_size=0.3, 
                                                    random_state=1)Let’s build the Text Classification Model using the Multinomial Naive Bayes Classifier.
First, import the MultinomialNB module and create a Multinomial Naive Bayes classifier object using MultinomialNB() function.
Then, fit your model on the train set using fit() and perform prediction on the test set using predict().
from sklearn.naive_bayes import MultinomialNB
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Generation Using Multinomial Naive Bayes
clf = MultinomialNB().fit(X_train, y_train)
predicted= clf.predict(X_test)
print("MultinomialNB Accuracy:",metrics.accuracy_score(y_test, predicted))Output: MultinomialNB Accuracy: 0.7333333333333333
Well, you got a classification rate of 73.34% using BoW features, which is a good accuracy. We can still improve this performance by trying different feature engineering and classification methods.
We can make predictions on a trained model using predict() method but before predict we must need to convert the input text into the vector using the transform() method.
# Transform into matrix
val=cv.transform(["Service of the restaurant is very slow but food was delicous"])
# make prediction
clf.predict(val)Output: array([0], dtype=int64)
The given input review us predicted as 0 or not liked because of its slow service.
Congratulations, you have made it to the end of this tutorial!
In this tutorial, you have performed restaurant review sentiment analysis using text classification with the Scikit-learn library. Learn More about sentiment analysis in this article Sentiment Analysis using Python.
I look forward to hearing any feedback or questions. you can ask the question by leaving a comment and I will try my best to answer it.
In this tutorial, we will focus on MapReduce Algorithm, its working, example, Word Count Problem,…
Learn how to use Pyomo Packare to solve linear programming problems. In recent years, with…
In today's rapidly evolving technological landscape, machine learning has emerged as a transformative discipline, revolutionizing…
Analyze employee churn, Why employees are leaving the company, and How to predict, who will…
Airflow operators are core components of any workflow defined in airflow. The operator represents a…
Machine Learning Operations (MLOps) is a multi-disciplinary field that combines machine learning and software development…