Table of Contents

Data Description

Reference: https://www.kaggle.com/c/web-traffic-time-series-forecasting/data

Original data: train_1.csv
-----------------------------
rows = 145,063
columns = 551
first column = Page
date columns = 2015-07-01, 2015-07-02, ..., 2016-12-31 (550 columns)
file size: 284.6 MB


Data for modelling: Prince Musician
-------------------------------------------------------
timeseries  : 2016 page visits for Prince 

lag columns : lag1 to lag7
bias        : bias column

For ARIMA   : we have only one timeseries (one column)
For sklearn : For linear regressor, ensemble learners we can have many columns

Colab

Load the Libraries

Useful Scripts

MAPE - Mean Absolute Percentage Error: $$ M A P E=\frac{100}{n} \sum_{i=1}^{n} \frac{\left|y_{i}-\hat{y}_{i}\right|}{y_{i}} $$

SMAPE - Symmetric Mean Absolute Percentage Error:

$$ S M A P E = \frac{100 \%}{n} \sum_{i=1}^{n} \frac{\left|y_{i} - \hat{y}\right|}{\left(\left|y_i\right| + \left|\hat{y}\right|\right) / 2}\\ \quad \quad = \frac{200 \%}{n} \sum_{i=1}^{n} \frac{\left|y_{i} - \hat{y}\right|}{ \left|y_i\right| + \left|\hat{y}\right|} $$

Load the data

Choose Prince Musician data as timeseries

Data Preprocessing

Add lag columns

Add bias term

Add timeseries features

Modelling

Train Test split

Scaling

Linear Regression

Regularized models: LassoCV and RidgeCV

Modelling: Ensemble Regressors

Time Taken