Table of Contents

Description

In this project we use the openml dataset of French Motor Vehicle Insurance Claims.

Data Source

The frequency dataset has 12 columns and 678,013 rows.
The severence dataset has 2 columns and 26,639 rows.

Load the libraries

Load the data

Train Test Split

Frequency model : Poisson distribution

PoissonRegressor


Parameters
----------
alpha : float, default=1
    Constant that multiplies the penalty term and thus determines the
    regularization strength. ``alpha = 0`` is equivalent to unpenalized
    GLMs. In this case, the design matrix `X` must have full column rank
    (no collinearities).

fit_intercept : bool, default=True
    Specifies if a constant (a.k.a. bias or intercept) should be
    added to the linear predictor (X @ coef + intercept).

max_iter : int, default=100
    The maximal number of iterations for the solver.

tol : float, default=1e-4
    Stopping criterion. For the lbfgs solver,
    the iteration will stop when ``max{|g_j|, j = 1, ..., d} <= tol``
    where ``g_j`` is the j-th component of the gradient (derivative) of
    the objective function.

After fitting the possion regressor. We can get the model.score.

Signature: glm_freq.score(X, y, sample_weight=None)
Docstring:
Compute D^2, the percentage of deviance explained.

D^2 is a generalization of the coefficient of determination R^2.
R^2 uses squared error and D^2 deviance. Note that those two are equal
for ``family='normal'``.

Returns
-------
score : float
    D^2 of self.predict(X) w.r.t. y.

Model Evaluation