Table of Contents

Imports

Useful Functions

Load the data

Preprocessing

Class Balance

Scaling

Random Under Sampling

Cons:

Train Test split with stratify for imbalanced data

Check for nans before modelling

Modelling

NOTE: Always used random state and n_jobs whenever possible

No random_state: knn
No n_jobs: svc and dtree

liblinear: It has both 'l1' and 'l2' penalty.
liblinear: It has n_jobs as +1, but lbfgs support -1.
lbfgs: It has only 'l2' but it is faster for large datasets.

liblinear :random_state
lbfgs: both
svc: random_state only
knn: n_jobs only
dtree: n_jobs only
rf: both

</div>

NOTE: Confusion Matrix Terms
Fraud ==> Fraud TP
Non-Fraud ==> Non-Fraud TN
Fraud ==> Non-Fraud FN (I am interested in this)
Non-Fraud ==> Fraud FP

Classifiers

Recall Scores from Cross Validation

SVC with Calibrated Probabilities

Reference: https://machinelearningmastery.com/calibrated-classification-model-in-scikit-learn/

Run Time