Table of Contents

Introduction to Boosting

References:

The term Boosting refers to a family of algorithms which converts weak learner to strong learners.

There are many boosting algorithms which impart additional boost to model’s accuracy. In this tutorial, we’ll learn about the two most commonly used algorithms i.e. Gradient Boosting (GBM) and XGboost.

Generally XGboost is considered more advanced than gbm.

Imports

Useful Scripts

Load the data

Train test split with stratify

Train Validation with stratify

Class Distribution

Modelling xgboost imbalanced data

Parameters:
-------------
max_depth=3
learning_rate=0.1
n_estimators=100
verbosity=1 **NOTE: it print in ipython terminal not in browser
silent=None **deprecated use verbosity
objective='binary:logistic' **for binary classification
booster='gbtree' **use default tree not linear
n_jobs=1 **make this -1
nthread=None **deprecated use n_jobs
gamma=0
min_child_weight=1
max_delta_step=0
subsample=1
colsample_bytree=1
colsample_bylevel=1
colsample_bynode=1
reg_alpha=0
reg_lambda=1
scale_pos_weight=1
base_score=0.5
random_state=0 **use your own random state
seed=None      **deprecated use random_state
missing=None

early stopping xgboost official note:

If early stopping occurs, the model will have three additional fields: bst.best_score, bst.best_iteration and bst.best_ntree_limit. Note that xgboost.train() will return a model from the last iteration, not the best one. Example

clf = xgb.XGBClassifier()
clf.fit(X_train, y_train, early_stopping_rounds=10, eval_metric="auc",
        eval_set=[(X_test, y_test)])

HPO for imbalanced data using sklearn

HPO for imbalanced data using optuna

Important Parameters:

Regularization parameters:

Optuna Hyperparameters Visualization

We need plotly 4 to render visulization in jupyter lab.

Best model from Optuna

Model Interpretation

Model interpretation using eli5

Model interpretation using shap

Model Evaluation Using Yellowbrick

class balance

Confusion matrix

Classification Report Heatmap

Class Prediction Error

ROCAUC

Total Time Taken