Portfolio Artur Skrzeta

Finding Sentiment among Customers

Naive Bays makes use of probability theorem for classifying uknown data.
What makes it so efficient is naive assumption that features are independent to each other.
We naively assume that features:
- temperature,
- humidity,
- wind speed,
- overcast,
are independet to each other and independetly contribute to probability of raining day.
Even if there is an obvious raltion between features, we treat each probability of each feature independetly.

App includes following features:

For the project I use multinomial Naive Bayes classifier.
I pick multinomial model as I want to perform classification on a set of discrete features.
In the project as discrete features we take word counts for text classification

Data input is in JSON format from which I extract review_text and review_rate.
Model accepts 30k comments on phones.
Each comment has rate assigned form 1 to 5.
Based on the rating I do the labeling: - rate 5 and 4 then comment is positive,
- rate 3 then comment is neutral,
- rate 2 and 1 then comment is negative.

Before a model learnign there is need to convert comments into theri numerical representation.
First step is preapring bag of words that contains all the unique words from all of the comments.
Secendly, we make check for each comment on which word form a bag appears in a comment and how many times.
Bag of words example:
sentece_1: 'Hello World, it is me!'
sentence_2: 'Hello Roman, it is Roman!'
bag of words: ['hello', 'is', 'it', 'me', 'roman', 'world']
numerical_sentence_1: [1,1,1,1,0,1]
- in the last but one position there is 0 as there is no 'roman' in sentece_1
numerical_sentence_2: [1,1,1,0,2,0]
- there is no 'me' in sentence_2 that is why there is 0 on 4th position
- there are two 'roman' words in sentence_2 that is why there is 2 on 5th position
- there is no 'world' in sentence_2 that is why there is 0 on last position

Giving the trained model a new set of comments: - 'This products sucks',
- 'Colors are nice, don't know which one to choose'.
I get results:
['Negative' 'Positive']

Script requires libraries installation:

pip install sklearn
pip install pandas

You can view the source code: HERE