Table of Contents

Introduction

Here, in this notebook I will use the model interpretation module dalex for the regression problem of house price prediction.

Load the libraries

Load the libraries

Useful Functions

Parameters

Load the Data

Modelling Xgboost

Model Evaluation using dalex

Create model explainer

DATASET LEVEL

Model Performance

Variable Importance using: model_parts

Customize the computation with parameters:

model_parts(
    loss_function=None,
    type=('variable_importance', 'ratio', 'difference', 'shap_wrapper'),
    N=1000,
    B=10, # num perm rounds, default = 10
    variables=None, # select only those variables
    variable_groups=None,
    keep_raw_permutations=True,
    processes=1,
    random_state=None,
    **kwargs)

Customize the plot with parameters:

plot(objects=None,
     max_vars=10,
     digits=3,
     rounding_function=<function around at 0x7fe0ced94560>,
     bar_width=16,
     split=('model', 'variable'),
     title='Variable Importance',
     vertical_spacing=None,
     show=True)

model_profile: accumulated

model_profile(type=('partial', 'accumulated', 'conditional'),
              N=300, # num of obs to use **most imp param **
              variables=None,
              variable_type='numerical',
              groups=None,
              span=0.25,
              grid_points=101,
              variable_splits=None,
              variable_splits_type='uniform',
              center=True,
              processes=1,
              random_state=None,
              verbose=True)

Choose a proper algorithm. The explanations can be calulated as Partial Dependence Profile or Accumulated Local Dependence Profile.

The key parameter is N number of observations to use (e.g. 800 for slower computation but more stable results).

Model Profile: Partial (pdp)

Model Profile: partial with categorical

INSTANCE LEVEL

Predict

Variable Attribution (predict_parts)

predict_parts(new_observation,
    type=('break_down_interactions', 'break_down', 'shap', 'shap_wrapper'),
    order=None,
    interaction_preference=1,
    path='average',
    B=25,
    keep_distributions=False,
    processes=1,
    random_state=None,
    **kwargs)

Here we can choose our varible attribution types. The explanations can be calulated as Break Down, iBreakDown or Shapley Values.

For type='shap' the key parameter is B number of bootstrap rounds (e.g. 10 for faster computation but less stable results).

Let's find out what attributes to the house price.

predict_profile: Ceteris Paribus Profiles

Looking at the Break Down plots, age and movement_ractions variables are standing out. Let's focus on them more.

Cleanup

Time Taken