Modelling Customer Churn using Catboost

Go to Top

References

Load the libraries

Go to Top

Colab

Useful Scripts

Go to Top

Load the Data

Go to Top

Data Processing

Go to Top

Data Processing

Data Types

Train and Test Data

Numerical and Categorical Features

Custom Features

Train Validation Split

Modelling

Go to Top

Regression Objectives:
MAE MAPE Poisson Quantile RMSE Huber Tweedie SMAPE R2 MSLE etc.

Classification Objectives:
Logloss CrossEntroy Precision Recall F1 BalancedAccuracy

Multiclassification objectives:
MultiClass MultiClassOneVsAll Precision Recall F1 TotalF1 MCC
Accuracy HingeLoss ZeroOneLoss Kappa WKappa AUC
#============================================================
catboost.CatBoostClassifier(
iterations                        = None, # n_estimators, num_trees, num_boost_round
learning_rate                     = None, # eta
depth                             = None, # max_depth
l2_leaf_reg                       = None, # reg_lambda
scale_pos_weight                  = None,

random_seed                       = None, # random_state
use_best_model                    = None,

verbose                           = None, # verbose_eval
silent                            = None,
logging_level                     = None, # silent verbose info debug

ignored_features                  = None,
cat_features                      = None, # indices or names
text_features                     = None,
one_hot_max_size                  = None,

objective                         = None, # loss_function
custom_loss                       = None,
custom_metric                     = None,
eval_metric                       = None,
score_function                    = None, # Cosine L2 NewtonCosine NewtonL2

subsample                         = None,
colsample_bylevel                 = None,

early_stopping_rounds             = None,
grow_policy                       = None,

classes_count                     = None,
class_weights                     = None, # list dict {0:1.0, 1:0.5} 
set 1 for zero, then weight = sum_neg/sum_pos for class one.
Do not use this parameter with auto_class_weights and scale_pos_weight.

auto_class_weights                = None,
class_names                       = None,

save_snapshot                     = None,
snapshot_file                     = None,
snapshot_interval                 = None
)
#===========================================================

Catboost eval metric

from catboost.utils import eval_metric
from math import log

labels = [1, 0, 1]
probabilities = [0.4, 0.1, 0.9]

# In binary classification it is necessary to apply the logit function
# to the probabilities to get approxes.

logit = lambda x: log(x / (1 - x))
approxes = list(map(logit, probabilities))

accuracy = eval_metric(labels, approxes, 'Accuracy')
#======================================================

Custom eval_metric

class LoglossMetric(object):
    def get_final_error(self, error, weight):
        return error / (weight + 1e-38)

    def is_max_optimal(self):
        return False

    def evaluate(self, approxes, target, weight):
        assert len(approxes) == 1
        assert len(target) == len(approxes[0])

        approx = approxes[0]

        error_sum = 0.0
        weight_sum = 0.0

        for i in range(len(approx)):
            e = np.exp(approx[i])
            p = e / (1 + e)
            w = 1.0 if weight is None else weight[i]
            weight_sum += w
            error_sum += -w * (target[i] * np.log(p) + (1 - target[i]) * np.log(1 - p))

        return error_sum, weight_sum

model = CatBoostClassifier(eval_metric=LoglossMetric())

Catboost classifier fit

catboost.CatBoostClassifier.fit(X,y,
cat_features          = None,
text_features         = None,
sample_weight         = None,
baseline              = None,
use_best_model        = None,
eval_set              = None,
verbose               = None,
logging_level         = None,
plot                  = False,
column_description    = None,
verbose_eval          = None,
metric_period         = None,
silent                = None,
early_stopping_rounds = None,
save_snapshot         = None,
snapshot_file         = None,
snapshot_interval     = None,
init_model            = None,

Catboost HPO Using Optuna

We generally should optimize model complexity and then tune the convergence.

Parameters:

WARNING:

For optuna,

study.optimize(
n_trials          = None,
timeout           = None,
n_jobs            = 1,
catch             = (),
callbacks         = None,
gc_after_trial    = False,
show_progress_bar = False
)

Optuna Visualization

Go to Top

Model Evaluation

Go to Top

Model Evaluation using SHAP

Time Taken

Go to Top