Portfolio Artur Skrzeta

Machine Learning in Porcurement

Intro

Spend analysis is necessary for reviewing procurement spend to decrease costs, increase efficiency or improve supplier relationships. ML model allows to recognize a spend type based on its specification.

Features

App includes following features:

Sklearn
Pandas
LinearSVC algorithm

Demo

Script perfoms following steps:

Loading labeled sets of features into memory.
Splitting data into training and testing subsets:
- X_train - features for the model training,
- y_train - corresponding labels for the model training,
- X_test - features for the model testing ,
- y_test - corresponding labels for the model testing.
Setting pipeline of voctorizing words and applying LinearSVC algorithm:
- We need to apply word vectorization when working with word features.
- To do so, we put all the words from all the labeled samples into one bag - bag of words.
- Then we check each sample's word features against the bag of words.
- While we checking, we create sub-list of smaple's word features which length eguals number of words in current smaple.
- Checking depends on counting sample's word occurance.
- Replacing words with numbers so that machine can compute it.
- The more frequently a word apperas, the bigger value it gets.
- Outcome of vectorization is a list with the numbers where each item is a signle word.
Training ML model using subsets X_train and y_train.
Using trained model to predict labels for test features from X_test.
We can assess model's accuracy by comparing predictons with respective test labels from y_test.
- Accuracy can be measured as divison: predictions / y_test.
- When it's highet than 80% then for my needs model is reliable.
Once I find model reliable, I can put new features that model has not seen.
Trained model gives labels for new features.

Input:

Here is the structure of data input.
First column is the column of labels.
Second column is the column of features.
Each feature has a label assigned.

Spend specifications are taken as features in ML model training.
Team's Clusters (f.e.: Information Technologies, Logistic) are taken as labels in ML model training.
Accuracy for the model equals 80% which for our purpose is totally enough.
80% accuracy means that 80% of predicted labels for testing features are in line with testing lables.
Application: we can use the ML model to recoginze Team's Cluster based on given spend specification assigned to spend.
This recognition enables us to distribute workload among partucular teams.

Here we can see both the overall accuracy and the accuracy for a particular team.

Output:

Above we can see the final output where ML model assigned labels to new features which have been never introduced to model.
New features:
- Box file storage
- Data archiving on server
- Building enginnering construction
- Automation business process
- Computer standing workstation
Script saves output in separate fie in current project directory.

Setup

Python libraries installation required.

pip install sklearn
pip install pandas

Source Code

You can view the source code: HERE