Table of Contents

Description

Question 01

Hints:

  1. Remove subjects with existing diabetes for this analysis.
  2. You can consider incident diabetes as a binary outcome and use logistic regressions; or if you want to try survival analysis,you can use Cox regressions utilizing the provided time-to-event information.
  3. Investigate each individual features and biomarkers and use p-values to justify your findings.
  4. For blood biomarkers, use transformation that is robust when there are outliers present.
  5. Demonstrate your findings with visualization.

Quesiton 02

Hints:

  1. Select the relevant blood biomarkers as features for your classifier.
  2. Select and train a ML model to make predictions.
  3. Evaluate your predictive model with ROCAUC.

Question 03

Hints:

  1. Use the subset of subjects who developed incident diabetes for your unsupervised learning.
  2. You can choose to use all or only relevant biomarkers for clustering.
  3. Select one approach to identify clusters of these subjects.
  4. Identify top blood biomarkers that contributed to the clustering.

Import modules

Load the data

Data Cleaning

Select small data

Remove subjects with existing diabetes for this analysis.

Create binary target

Data Preparation for Modelling

Undersampling the data (imbalanced data)

train test split

Missing values

Robust Scaling

Modelling lightgbm with vaex