Sentiment analysis of book reviews

With the rise in e-commerce, popularity of shopping vendors like Amazon is on rise.
Often at times, customers express their opinion or sentiment by giving feedback or reviews.
These reviews, or feedbacks are in the form of text.
Sentiment analysis is the process of determining the opinion, reviews or feeling expressed as either positive, negative or neutral.
Capturing the exact sentiment of a review through text is a challenging task.
In this notebook, various preprocessing techniques like HTML tags and URLs removal, punctuation, whitespace, special character removal and stemming are used to clean the reviews.
The preprocessed data is represented using feature selection techniques like term frequency-inverse document frequency (TF–IDF).
The classifiers like Decision Tree (DT), Support Vector Machine (SVM), Linear regression (RF) and Naive Bayes (NB) are used to classify sentiment of Amazon book reviews.
Finally, (i) comparison of various classifiers based on F1 Score and Accuracy, (ii) Tune the selected model using grid-search and (iii) Performed classification on unseen data.

Data Class

Load data

Prep Data

Bag of words vectorization

Classification

Linear SVM

Decision Tree Classifier

naive bayes

Logistic Regression

Evluation

Tuning our model (with Grid Search)

Saving Model

Load model

Raw Data download:

http://jmcauley.ucsd.edu/data/amazon/

Description

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.
This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).
References1!..