Machine Learning in Porcurement

Intro

Spend analysis is necessary for reviewing procurement spend to decrease costs, increase efficiency or improve supplier relationships. ML model allows to recognize a spend type based on its specification.

Features

App includes following features:

  • Sklearn
  • Pandas
  • LinearSVC algorithm

Demo

Script perfoms following steps:
  1. Loading labeled sets of features into memory.
  2. Splitting data into training and testing subsets:
    - X_train - features for the model training,
    - y_train - corresponding labels for the model training,
    - X_test - features for the model testing ,
    - y_test - corresponding labels for the model testing.
  3. Setting pipeline of voctorizing words and applying LinearSVC algorithm:
    - We need to apply word vectorization when working with word features.
    - To do so, we put all the words from all the labeled samples into one bag - bag of words.
    - Then we check each sample's word features against the bag of words.
    - While we checking, we create sub-list of smaple's word features which length eguals number of words in current smaple.
    - Checking depends on counting sample's word occurance.
    - Replacing words with numbers so that machine can compute it.
    - The more frequently a word apperas, the bigger value it gets.
    - Outcome of vectorization is a list with the numbers where each item is a signle word.
  4. Training ML model using subsets X_train and y_train.
  5. Using trained model to predict labels for test features from X_test.
  6. We can assess model's accuracy by comparing predictons with respective test labels from y_test.
    - Accuracy can be measured as divison: predictions / y_test.
    - When it's highet than 80% then for my needs model is reliable.
  7. Once I find model reliable, I can put new features that model has not seen.
  8. Trained model gives labels for new features.
Input:
  • Here is the structure of data input.
  • First column is the column of labels.
  • Second column is the column of features.
  • Each feature has a label assigned.
  • Spend specifications are taken as features in ML model training.
  • Team's Clusters (f.e.: Information Technologies, Logistic) are taken as labels in ML model training.
  • Accuracy for the model equals 80% which for our purpose is totally enough.
  • 80% accuracy means that 80% of predicted labels for testing features are in line with testing lables.
  • Application: we can use the ML model to recoginze Team's Cluster based on given spend specification assigned to spend.
  • This recognition enables us to distribute workload among partucular teams.
  • Here we can see both the overall accuracy and the accuracy for a particular team.
Output:
  • Above we can see the final output where ML model assigned labels to new features which have been never introduced to model.
  • New features:
    - Box file storage
    - Data archiving on server
    - Building enginnering construction
    - Automation business process
    - Computer standing workstation
  • Script saves output in separate fie in current project directory.

Setup

Python libraries installation required.
pip install sklearn
pip install pandas

Source Code

You can view the source code: HERE