Table of Contents

Description

Question 01

Hints:

  1. Remove subjects with existing diabetes for this analysis.
  2. You can consider incident diabetes as a binary outcome and use logistic regressions; or if you want to try survival analysis,you can use Cox regressions utilizing the provided time-to-event information.
  3. Investigate each individual features and biomarkers and use p-values to justify your findings.
  4. For blood biomarkers, use transformation that is robust when there are outliers present.
  5. Demonstrate your findings with visualization.

Question 02

Hints:

  1. Select the relevant blood biomarkers as features for your classifier.
  2. Select and train a ML model to make predictions.
  3. Evaluate your predictive model with ROCAUC.

Question 03

Hints:

  1. Use the subset of subjects who developed incident diabetes for your unsupervised learning.
  2. You can choose to use all or only relevant biomarkers for clustering.
  3. Select one approach to identify clusters of these subjects.
  4. Identify top blood biomarkers that contributed to the clustering.

References:

Import the modules

Load the data

Select only biomarkers

Modelling kmeans clustering (no train-test split for unsupervised algos)

find best number of clusters

silhouette score plot

Hierarchical clustering

Feature importance of clustering

Using 2 clusters

Using 5 clusters