Table of Contents

Data Description

This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015.

Task: Try to estimate the price based on given features.

Imports

Load the data

Train test split

Simple Linear Regression

Modelling Simple Linear Regression Using statsmodels

Statistics Questions

Yes, the low P-value associated with the t-statistic for feature suggests so.

For a unit increase in sqft_living, our model predicts price will increase by 280.6854

Positive

predicted price is 518741.86 and confidence interval is [500426.67775749206, 537057.0363632903]

Model score (coefficient of determination R^2) for training

Model Predictions after adding bias term

Residual Plots

Best fit line with confidence interval

Seaborn regplot

Assumptions of Linear Regression

log transformation using ols formula

Multiple Linear Regression

Polynomial Regression