Finding Sentiment among Customers

Intro

  • Naive Bays makes use of probability theorem for classifying uknown data.
  • What makes it so efficient is naive assumption that features are independent to each other.
  • We naively assume that features:
    - temperature,
    - humidity,
    - wind speed,
    - overcast,
    are independet to each other and independetly contribute to probability of raining day.
  • Even if there is an obvious raltion between features, we treat each probability of each feature independetly.

Features

App includes following features:

  • Pandas
  • SKLearn

Demo

Methodology:
  • For the project I use multinomial Naive Bayes classifier.
  • I pick multinomial model as I want to perform classification on a set of discrete features.
  • In the project as discrete features we take word counts for text classification
Data set:
  • Data input is in JSON format from which I extract review_text and review_rate.
  • Model accepts 30k comments on phones.
  • Each comment has rate assigned form 1 to 5.
  • Based on the rating I do the labeling: - rate 5 and 4 then comment is positive,
    - rate 3 then comment is neutral,
    - rate 2 and 1 then comment is negative.
Bag of words:
  • Before a model learnign there is need to convert comments into theri numerical representation.
  • First step is preapring bag of words that contains all the unique words from all of the comments.
  • Secendly, we make check for each comment on which word form a bag appears in a comment and how many times.
  • Bag of words example:
    sentece_1: 'Hello World, it is me!'
    sentence_2: 'Hello Roman, it is Roman!'
    bag of words: ['hello', 'is', 'it', 'me', 'roman', 'world']
    numerical_sentence_1: [1,1,1,1,0,1]
    - in the last but one position there is 0 as there is no 'roman' in sentece_1
    numerical_sentence_2: [1,1,1,0,2,0]
    - there is no 'me' in sentence_2 that is why there is 0 on 4th position
    - there are two 'roman' words in sentence_2 that is why there is 2 on 5th position
    - there is no 'world' in sentence_2 that is why there is 0 on last position
Case study:
  • Giving the trained model a new set of comments: - 'This products sucks',
    - 'Colors are nice, don't know which one to choose'.
    I get results:
    ['Negative' 'Positive']

Setup

Script requires libraries installation:

pip install sklearn
pip install pandas

Source Code

You can view the source code: HERE