Table of Contents

Colab

Imports

Useful Scripts

Load the data

Data Processing

Class balance

Feature Selection

Log transform

Train-validation-test split with stratify

Normalize the data

Oversampling minority class

Modelling: Keras Sequential

Params and Metrics

Build the model

Fit the model

Model Evaluation

Confusion Matrix

NOTE: confusion matrix is TN FP FN TP Here, diagonal values are True Negatives and True Postives, ideally we want non-diagonal elements to be zero. But we some mis-predictions. The top right value is False Positive. They are not frauds but our model predict them as frauds. Company needs to send them email to verify if the transaction is legal or not and if we send too many emails it might annoy the customers. The bottom left value is False Negative. They are the actual frauds classified as normal persons by the model. This cost the company money and have to deal with fraudulent cases. This is much more undesirable than False Posivites. In real life, the trade off much be chosen much carefully so as not to bug too many customers as well as not to miss the frauds.

Accuracy Recall Scores

Training History Plots

WARNING: Here Validation data has more AUC than training, this is because dropout layer is not active when evaluating the model.

Retrain for oversampled data

Because training is easier on the balanced data, the above training procedure may overfit quickly.

So break up the epochs to give the callbacks.EarlyStopping finer control over when to stop training.

Time Taken