The Allstate Corporation is an American insurance company that is in the United States. The company also has personal lines insurance operations in Canada.
Data Source Kaggle: https://www.kaggle.com/c/allstate-claims-severity/data
Here, in this linear regression machine learning problem, I tried various approaches to predict the target variable "loss" from given continuous features (cont) and categorical features (cat).
Rather than building one huge notebook for data processing, hyperparameter tuning, modelling, and model evaluation, I break up the project into EDA, Cleaning and, Modelling parts.
Here, I did following cleaning: - Box-Cox transform high skewed continuous variables - One hot encoding categorical variables - Missing values imputation.
In this project I used following procedure for modelling the problem: - Choose boxcox transformed continuous features. - Choose One hot encoded categorical features. - Standard Scaling the data. - Log transformation of Target.
Algorithms Used: - Random Forest Regressor - Extra Tree Regressor - Xgboost Regressor - Stacking of these three estimators
| Model | 2-Fold Cross Validation MAE | Time Taken |
|---|---|---|
| Random Forest | 1292.87 | 1min 37s |
| Extra Trees | 1243.44 | 6min 32s |
| Xgboost | 1156.78 | 12min 25s |
| Stacking | 1165.00 | 55.5 s |