TrainingCenter

This repository is part of an entire project to study age prediction and survival prediction from NHANES dataset. The code of this project is split into 3 repositories:

📦NHANES_preprocessing to scrape the NHANES website and preprocess the data.
📦TrainingCenter to train the algorithms from the dataset created in the previous repository.
📦CorrelationCenter to study the outputs of the models trained in the previous repository.

Feel free to start a discussion to ask anything here.

Installation

To setup your virtual environment:

pip install pip==20.0.2
pip install -e .

Structure to have before launching the jobs

Before launching the jobs, you need to get the datasets and to set the folds properly by executing the following:

make_folds --main_category MAIN_CATEGORY --category CATEGORY --number_folds NUMBER_FOLDS

For this command line to work, you need to have this folder structure:

┣ 📦NHANES_preprocessing 
┃  ┗ 📂merge
┃    ┗ 📂data
┃       ┣ 📂examination
┃       ┃ ┗ 📜[category].feather
┃       ┣ 📂laboratory
┃       ┃ ┗ 📜[category].feather
┃       ┗ 📂questionnaire
┃         ┗ 📜[category].feather
┣ 📦TrainingCenter
   ┗ 📂[...]

Pipelines

There are three pipelines available.

Predictions

To predict the biological age or the risk of dying, you can use the command line made for that purpose:

prediction --main_category MAIN_CATEGORY --category CATEGORY --target TARGET --algorithm ALGORITHM --random_state RANDOM_STATE --n_inner_search N_INNER_SEARCH

Basic predictions

To have the control on the survival predictions, you can train the models with only age, sex and ethnicities by using this command line:

basic_prediction --main_category MAIN_CATEGORY --category CATEGORY --target TARGET --algorithm ALGORITHM --random_state RANDOM_STATE --n_inner_search N_INNER_SEARCH

Feature importances

To get the feature importances of the models, you can use:

feature_importances --main_category MAIN_CATEGORY --category CATEGORY --target TARGET --algorithm ALGORITHM --random_state RANDOM_STATE --n_inner_search N_INNER_SEARCH

Results

All the results are available in this spread sheet. The results are automatically updated to the spread sheet when the computations are done.

Executing the file ./shape_age_range/export_information.py will add the shapes and the age ranges to the spread sheet for each category and each target.

Launching jobs

The folder fit_running gathers all the scripts for you to launch jobs on a cluster of computers using Slurm without you having to tell how much memory or time limit you need.

To run the tests

python -m unittest

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github/workflows		.github/workflows
feature_importances		feature_importances
fit_running		fit_running
fold_maker		fold_maker
prediction		prediction
shape_age_range		shape_age_range
test		test
utils		utils
.gitignore		.gitignore
README.md		README.md
command_lines.sh		command_lines.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrainingCenter

Installation

Structure to have before launching the jobs

Pipelines

Predictions

Basic predictions

Feature importances

Results

Launching jobs

To run the tests

About

Releases 1

Packages

Languages

HMS-AgeVSSurvival/TrainingCenter

Folders and files

Latest commit

History

Repository files navigation

TrainingCenter

Installation

Structure to have before launching the jobs

Pipelines

Predictions

Basic predictions

Feature importances

Results

Launching jobs

To run the tests

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages