Table of Contents

Data Description

Reference: https://www.kaggle.com/c/web-traffic-time-series-forecasting/data

Original data: train_1.csv
-----------------------------
rows = 145,063
columns = 551
first column = Page
date columns = 2015-07-01, 2015-07-02, ..., 2016-12-31 (550 columns)
file size: 284.6 MB


Data for time series
----------------------------------------------
selected year: 2016 (leap year 366 days)

selected time series:

The most visited page.
df['Page'] == """Special:Search_en.wikipedia.org_desktop_all-agents"""

Load the libraries

Useful Functions

Load the data

Timeseries Statistics

Normality Test

Reference: https://machinelearningmastery.com/a-gentle-introduction-to-normality-tests-in-python/

import statsmodels.api as sm
from scipy import stats
x = stats.norm.rvs(loc=5, scale=3, size=1000) # mean=5, std=3

# qq-plot
sm.qqplot(x, loc = 5, scale = 3, line='s')

# shapiro-wilk
res = stats.shapiro(x)

# anderson-darling
res = stats.anderson(x, dist='norm')

Stationarity Test (Augmented Dickey Fuller Test)

Ref: https://www.analyticsvidhya.com/blog/2018/09/non-stationary-time-series-python/

------------------------------------------------------------------------------

Stationarity Tests

Unit root indicates that the statistical properties of a given series are not constant with time, which is the condition for stationary time series.

Suppose we have a time series :

yt = a*yt-1 + ε t

where yt is the value at the time instant t and ε t is the error term. In order to calculate yt we need the value of yt-1, which is :

yt-1 = a*yt-2 + ε t-1

If we do that for all observations, the value of yt will come out to be:

yt = anyt-n + Σεt-iai

If the value of a is 1 (unit) in the above equation, then the predictions will be equal to the yt-n and sum of all errors from t-n to t, which means that the variance will increase with time.

This is knows as unit root in a time series. We know that for a stationary time series, the variance must not be a function of time.

The unit root tests check the presence of unit root in the series by checking if value of a=1.

The two most popular test of unit root are:

------------------------------------------------------------------------------

Types of Stationarity
Let us understand the different types of stationarities and how to interpret the results of the above tests.

It’s always better to apply both the tests, so that we are sure that the series is truly stationary. Let us look at the possible outcomes of applying these stationary tests.

------------------------------------------------------------------------------

Making a Time Series Stationary

Auto correlation plots

Seasonal Decomposition

Trend Analysis

Seasonability Analysis

Periodicity and Autocorrelation

X-axis is number of lag h.

Y-axis is Autocorrlation of timeseries to lag of itself. $$ C_{n}=\sum_{t=1}^{n-h}(y(t)-\hat{y})(y(t+n)-\hat{y}) / n $$

$$ C_{0}=\sum_{t=1}^{n}(y(t)-\hat{y})^{2} / n $$