Finding Sentiment among Customers
Intro
- Naive Bays makes use of probability theorem for classifying uknown data.
- What makes it so efficient is naive assumption that features are independent to each other.
- We naively assume that features:
- temperature,
- humidity,
- wind speed,
- overcast,
are independet to each other and independetly contribute to probability of raining day. - Even if there is an obvious raltion between features, we treat each probability of each feature independetly.
Features
App includes following features:
Demo
Methodology:
- For the project I use multinomial Naive Bayes classifier.
- I pick multinomial model as I want to perform classification on a set of discrete features.
- In the project as discrete features we take word counts for text classification
Data set:
- Data input is in JSON format from which I extract review_text and review_rate.
- Model accepts 30k comments on phones.
- Each comment has rate assigned form 1 to 5.
- Based on the rating I do the labeling:
- rate 5 and 4 then comment is positive,
- rate 3 then comment is neutral,
- rate 2 and 1 then comment is negative.
Bag of words:
- Before a model learnign there is need to convert comments into theri numerical representation.
- First step is preapring bag of words that contains all the unique words from all of the comments.
- Secendly, we make check for each comment on which word form a bag appears in a comment and how many times.
- Bag of words example:
sentece_1: 'Hello World, it is me!'
sentence_2: 'Hello Roman, it is Roman!'
bag of words: ['hello', 'is', 'it', 'me', 'roman', 'world']
numerical_sentence_1: [1,1,1,1,0,1]
- in the last but one position there is 0 as there is no 'roman' in sentece_1
numerical_sentence_2: [1,1,1,0,2,0]
- there is no 'me' in sentence_2 that is why there is 0 on 4th position
- there are two 'roman' words in sentence_2 that is why there is 2 on 5th position
- there is no 'world' in sentence_2 that is why there is 0 on last position
Case study:
- Giving the trained model a new set of comments:
- 'This products sucks',
- 'Colors are nice, don't know which one to choose'.
I get results:
['Negative' 'Positive']
Setup
Script requires libraries installation:
pip install sklearn
pip install pandas