Table of Contents

Load the libraries

Load the data

Train Test Split

Pure Premium Modelling : Tweedie GLM

Ref: https://scikit-learn.org/stable/modules/linear_model.html#generalized-linear-regression

We can model the total claim amount per unit of exposure using two methods:

  1. Multiply prediction of Frequency Model and Severity Model.
  2. Use GLM method (Tweedie Regressor)

Generalized Linear Models (GLM) extend linear models in two ways 10. First, the predicted values $\hat{y}$ are linked to a linear combination of the input variables via an inverse link function h as

$$ \hat{y}(w, X)=h(X w) $$

Secondly, the squared loss function is replaced by the unit deviance of a distribution in the exponential family (or more precisely, a reproductive exponential dispersion model (EDM)

The minimization problem becomes: $$ \min _{w} \frac{1}{2 n_{\text {samples }}} \sum_{i} d\left(y_{i}, \hat{y}_{i}\right)+\frac{\alpha}{2}\|w\|_{2} $$

where $\alpha$ is the L2 regularization penalty. When sample weights are provided, the average becomes a weighted average.

The following table lists some specific EDMs and their unit deviance (all of these are instances of the Tweedie family):

Distribution Target Domain Unit Deviance d(y,yhat) Power Regressors
Normal $y \in(-\infty, \infty)$ $(y-\hat{y})^{2}$ 0 Ridge, ElasticNet
Poisson $y \in[0, \infty)$ $2\left(y \log \frac{y}{\hat{y}}-y+\hat{y}\right)$ 1 PoissonRegressor as alias of TweedieRegressor(power=1, link='log')
Gamma $y \in(0, \infty)$ $2\left(\log \frac{\hat{y}}{y}+\frac{y}{\hat{y}}-1\right) $ 2 GammaRegressor as alias of TweedieRegressor(power=2, link='log')
Inverse Gaussian $y \in(0, \infty)$ $\frac{(y-\hat{y})^{2}}{y \hat{y}^{2}}$ 3 TweedieRegressor(power=3, link='log')

The choice of the distribution depends on the problem at hand:

Examples of use cases include:

TweedieRegressor(*,power=0.0,alpha=1.0,fit_intercept=True,link='auto',
max_iter=100,tol=0.0001,warm_start=False,verbose=0,)

power : float, default=0
        The power determines the underlying target distribution according
        to the following table:

        +-------+------------------------+
        | Power | Distribution           |
        +=======+========================+
        | 0     | Normal                 |
        +-------+------------------------+
        | 1     | Poisson                |
        +-------+------------------------+
        | (1,2) | Compound Poisson Gamma |
        +-------+------------------------+
        | 2     | Gamma                  |
        +-------+------------------------+
        | 3     | Inverse Gaussian       |
        +-------+------------------------+

        For 0 < power < 1, no distribution exists.

D^2 is a generalization of the coefficient of determination R^2.R^2 uses squared error and D^2 deviance. Note that those two are equal for family='normal'. D^2 is defined as

$$ \mathcal{D}^{2}=1-\frac{D\left(y_{\text {true}}, y_{\text {pred}}\right)}{D_{\text {null}}} $$

$D_{n u l}$ is the null deviance, i.e. the deviance of a model with intercept alone, which corresponds to $y_{p r e d}=\bar{y}$.

The mean $\bar{y}$ is averaged by sample_weight. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).