Table of Contents

Introduction to Project

References:

Some samples:

$\underline{\text{Data specification}}$

${\text{Number of samples }} : n = 569 \\ {\text{Full dataset}} : \ D = \{ X, Y \} = \{ x_i, y_i \}_{i=1}^{n}$

$ \text{Input space : } \dim(X) = ( \underset{\text{case :}}{n \times 10}) \times \underset{\text{a, b, c}}{3} \\ \text{Output space : }\dim(Y) = n $

Here we say case a is the mean, case b is the SE (standard error $\frac{\sigma}{\sqrt{n}})$ and case c is Worst. Worst means mean of 3 maximum values.

$\underline{\text{Dependent variable / Target}} \ \big( \ y_i = \{B, \, M \} \ \big)$

$\underline{\text{Independent variable / Feature}} \,\ \big( \ x_i \in \mathbb{R}^{10} \ \big)$

To sum up, we can say that the dataset fulfills - $X = [X^{(a)}_{\text{Mean}}, \, X^{(b)}_{\text{SE}}, \, X^{(c)}_{\text{Worst}}] \in \mathbb{R}^{(n \times 10) \times 3}$ :

Feature Mean SE Worst
1.Radius $\quad \vdots$ $\ \vdots$ $\quad \vdots$
$\quad \cdots$ $\quad \vdots$ $\ \vdots$ $\quad \vdots$
10.Fractal dim. $\quad \vdots$ $\ \vdots$ $\quad \vdots$

This means there are 10 features with 3 statistics for each samples.

Imports

Useful Scripts

Load the data

Data Manipulation

Exploratory Data Analysis

Statistics of 10 features

Density Plots

Correlation

Radar Chart (Spider diagram or Polygon Plot) for minmax normalized Mean features

Observations: