Table of Contents

Data Description

This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015.

Imports

Load the data

Correlation heatmap

Histogram of all features

Univariate Analysis

Temporal Variables

Discrete Variables

Features that we can count like 1,2,3 are called discrete variables. For example

bedrooms, #floors and so on.

A simple check of discrete variable is look at number of unique labels and look at features having integer dtype. Sometimes features can have float dtype but still be a discrete variable.

Continuous variables

Categorical variables

Variables such as district names, country names, class names etc are categorical variables. Sometimes discrete variables like #bedrooms #floors can be treated as categorical variables.

When a categorical variable has a class less than 1% of the data, this may cause overfitting and we may want to rename it to RARE and may also drop these rows.

waterfront

Observation

We have not tested the assumptions of point-biserial correlation

basement_bool

renovation_bool

box plots of categorical features

Bi-variate Analysis

Observation:

Observation

Observation

Multi-variate Analysis