Table of Contents

Imports

Load the data

Preprocessing

Class Balance

Scaling

Random Under Sampling

Cons:

Train Test split with stratify for imbalanced data

Check for nans before modelling

Modelling

NOTE: Always used random state and n_jobs whenever possible

No random_state: knn
No n_jobs: svc and dtree

liblinear: It has both 'l1' and 'l2' penalty.
liblinear: It has n_jobs as +1, but lbfgs support -1.
lbfgs: It has only 'l2' but it is faster for large datasets.

liblinear :random_state
lbfgs: both
svc: random_state only
knn: n_jobs only
dtree: n_jobs only
rf: both

</div>

NOTE: Confusion Matrix Terms
Fraud ==> Fraud TP
Non-Fraud ==> Non-Fraud TN
Fraud ==> Non-Fraud FN (I am interested in this)
Non-Fraud ==> Fraud FP

Data Processing before modelling

Decision Tree Classification

Feature Importance

plot tree using sklearn tree

plot using pydotplus and export_graphviz

Plot using graphviz Source and IPython display SVG

Interactive plot using ipywidgets

Run Time