Table of Contents

Data Description

Reference: https://www.kaggle.com/c/web-traffic-time-series-forecasting/data

Original data: train_1.csv
-----------------------------
rows = 145,063
columns = 551
first column = Page
date columns = 2015-07-01, 2015-07-02, ..., 2016-12-31 (550 columns)
file size: 284.6 MB




Data for modelling: 
--------------------------------------------------------------------
Timeseries   : Now You See Me es (Spain, random_state=42)

For ARIMA    : we have only one timeseries (one column)
For sklearn  : For linear regressor, ensember learners we can have many columns
For fbprophet: we have only dataframe with columns ds and y (additional cap and floor)

Prophet Description

References:

We use a decomposable time series model with three main model components: trend, seasonality, and holidays. They are combined in the following equation:

$$ y(t)=g(t)+s(t)+h(t)+\epsilon_{t} $$

Using time as a regressor, Prophet is trying to fit several linear and non linear functions of time as components.

Modeling seasonality as an additive component is the same approach taken by exponential smoothing in Holt-Winters technique .

We are, in effect, framing the forecasting problem as a curve-fitting exercise rather than looking explicitly at the time based dependence of each observation within a time series.

Trend parameters:

Parameter Description
growth linear’ or ‘logistic’ to specify a linear or logistic trend
changepoints List of dates at which to include potential changepoints (automatic if not specified)
n_changepoints If changepoints in not supplied, you may provide the number of changepoints to be automatically included
changepoint_prior_scale Parameter for changing flexibility of automatic changepoint selection

Seasonality & Holiday Parameters:

Parameter Description
yearly_seasonality Fit yearly seasonality
weekly_seasonality Fit weekly seasonality
daily_seasonality Fit daily seasonality
holidays Feed dataframe containing holiday name and date
seasonality_prior_scale Parameter for changing strength of seasonality model
holiday_prior_scale Parameter for changing strength of holiday model

Evaluation Metric

The formula for SMAPE (Symmetric Mean Absolute Percentage Error) is given below:

$$ S M A P E=\frac{100 \%}{n} \sum_{t=1}^{n} \frac{\left|F_{t}-A_{t}\right|}{\left(\left|A_{t}\right|+\left|F_{t}\right|\right) / 2} $$

Where, F is forecast and A is the actual value of time series at given time t.

Python implementation:

def smape(A, F):
    F = A[:len(A)]
    return ( 200.0/len(A) * np.sum(  np.abs(F - A) / 
                                  (np.abs(A) + np.abs(F) + np.finfo(float).eps))
           )

Despite the name Symmetric, the smape is not actually symmetric. Take this example from wikipedia for an example:

The SMAPE is not symmetric since over- and under-forecasts are not treated equally. This is illustrated by the following example by applying the SMAPE formula:

Over-forecasting: At = 100 and Ft = 110 give SMAPE = 4.76%
Under-forecasting: At = 100 and Ft = 90 give SMAPE = 5.26%.

Imports

Useful Scripts

Load the data

Data Preprocessing

Modelling: prophet

Create dataframe with two columns: ds and y

model1: default parameters

Model2: saturation cap and floor

Model3: Seasonality

Plotly Visualizations for prophet