Skip to content

Latest commit

 

History

History
185 lines (147 loc) · 11.7 KB

README_old_Project_process.md

File metadata and controls

185 lines (147 loc) · 11.7 KB

Parkinson Disease Project - DSR

Subject: Modelling Parkinson Disease Progression and predicting best treatment strategies for PD patients with Machine Learning/Deep Learning

AI & Healthcare topics: diagnosis prediction/accuracy, precision medecine with personalized treatments

0. Project context

  • number of hours to work on it: ~350-400h
  • Project output:
  • Github repo with reproducible code
  • Oral Presentation (~10/15min + questions) in front of panel of companies and data scientists in Berlin
  • Slides Deck
  • Optional: Video about the Project Process, Medium Article, Scientific paper (if relevant and possible)

1. Abstract / Project objectives:

The main goal of this project is predicting Parkinson Disease (PD) Progression over time for a set of patients from a clinical study.

More precisely, the project objectives are: Starting from a dataset of Electronic Health Records of patients with 4 different clinical diagnosis (PD diagnosed patients, 400 de novo PD subjects - newly diagnosed and unmedicated, SWEDD and healthy control subjects),I will:

  • Classify clinical diagnosis for each patient
  • Predict Parkinson Disease Progression over time using:
    • 2a. First, only biological biomarkers .
    • 2b. Add in a second time MRI data (raw brain scans from the PPMI dataset) . (the idea is to compare the performance of the models with and without MRI data, as performing MRI is expensive for hospitals and also not easy to process as data).
  • Predict best treatment strategy associated to each patient and each future disease state
  • If enough time: predicting patients' time of death (see recent article about Google having done such predictions)
  • If enough time: ANALYZE TEMPORAL PATTERNS OF BIOMARKERS USING LONGITUDINAL STABILITY SELECTION (as done on AD in Fused Sparse Group Lasso Paper)

NB: SWEDD= Scans without evidence of dopamine deficit: patients who look like they have PD in terms of symptom but subsequent functional imaging assessment does not confirm this. source: http://www.acnr.co.uk/SO10/ACNRSO10_30_SWEDD_article.pdf

2. MAIN DATASET:

http://www.ppmi-info.org/access-data-specimens/download-data/

Data Pop : 2144 patients (as of SCREEN tables). 1052 PD people (PD (Gen and non Gen cohort), SWEDD, PRODROMA)

Patients Hospital visits timeline (timeseries period):

  • 15 visits besides screening, Baseline and unscheduled visits.
  • -> V04b: every 3 month / V05b -> V13: every 6 months / V14, V15 and beyond; every 12 months.

Schedule of activities for the clinical studies here:

(http://www.ppmi-info.org/wp-content/uploads/2018/02/PPMI-AM-13-Protocol_SCHEDULE-OF-ACTIVITIES-1.pdf)

data dictionnary annotated here:

https://docs.google.com/spreadsheets/d/1Q-0zAG_oBfuo21s5xzN6Vwck5NT0GyP0K3KxiWpxO58/edit#gid=782043355 (work in progress)

3. DS main challenges for this project:

  • Multi-modal (dataframes + MRI scans) multi-label classification problem of timeseries with features selection depending on the label prediction (the PD predominant biomarkers are depending on the disease state)

4. Use cases:

  • FOR DOCTORS:
    • Early detection of PD subjects
    • Improving disease accuracy,
    • Anticipation of their patients' disease evolution
    • support to decision for choosing treatments for patients
  • FOR PD PATIENTS:
    • For people with risk of getting PD: early detection and monitoring
    • For early-stage/late-stage PD people: monitoring better their disease by knowing the expected evolution
    • Finding the best treatment that will improve the quality of life of PD people
  • FOR HOSPITALS:
  • cost reductions by improved patient management & treatment optimization
  • operations improvement by predicting patients future visits, med supplies, etc...

5.DS Process for the project

  1. Data Processing/Cleaning: Merge/Join of Tables (eventually using PyTables)

  2. EDA: timeseries analysis & plotting

look @ sparsity, irregularity,trends...

3.Feature Engineering: (if classic ML is used)

  1. ML/DL models to try:
  • Baseline: a statistical model > the one for a multivariate timeseries with endogeneous variables without trends and without seasonality = VARMA.
    • other possibilities: discounted median, persistence model.

Some references: https://machinelearningmastery.com/persistence-time-series-forecasting-with-python/

reference paper for timeseries foreacast of EHR: https://arxiv.org/pdf/1706.03446.pdf (to print)

About chossing between HMM & RNNs:

  1. Outputs of the model . For each patient:
  • a vector of disease states for different timestamps (multiple labels for each timestamp corresponding to UPDRS scale) .

link to UPDRS scale: UPDRS: http://www.etas.ee/wp-content/uploads/2013/10/updrs.pdf

  • The associated treatment strategy related to the disease state (multiple labels as well)
  • The estimated time of death
  1. Data Visualisation to present the results . Plots comparating the performance of the different models for different metrics

4. DS tools to be used (Python Libraries...):

DS Process Tool/Python lib Comments
0. Project Management Trello, slack, Github
1. Data Processing PyTables? or only pandas Mosty done
2. EDA Cufflinks/plotly, Seaborn, Scipy, statsmodels.tsa Mostly done
3. Feature Engineering sci-kit learn Manually done for now .
4a. Classic ML sci-kit learn probably no time for that
4b. DL keras with TensorFlow BE, Tensorboard, Google Open Cloud/AWS tutos started
5. Dataviz TBD, probably Seaborn

Links:

Python tricks to used

  • Pickles

https://docs.python.org/2/library/pickle.html . https://pythontips.com/2013/08/02/what-is-pickle-in-python/


8. Useful tutos to do for the project

Deep Learning - General

Biomedical images processing

9. List of contacts that could help me & summary of discussions (other than mentor(s))

see here: https://github.com/AMDonati/parkinson-disease-project/wiki/useful-contacts-&-discussions-summary

10. Reference Documents

11. Additonal datasets that could be used:

See here: https://github.com/AMDonati/parkinson-disease-project/wiki/Other-PD-medical-datasets


Parkinson News to check:

https://parkinsonsnewstoday.com/2018/07/31/ema-endorses-use-of-imaging-radioactive-tracer-to-improve-parkinsons-clinical-trial-recruitment/