clinical-longformer

This is the source code for the Master's thesis project "Hospital readmission prediction with long clinical notes", presented in partial fulfilment of the requirements for the degree of MSc in Computer Science in the University of Cape Town.

Description

The project aims to evaluate the effect of using a Transformer-based model with a sparse attention pattern to predict 30-day hospital readmission on a cohort from the MIMIC-III dataset.

Available here is the source-code for data processing, model training and evaluation.

The code was tested using Python 3.8 in a Linux environment, and it is recomended to use a virtual environment.

Setup

Install the dependencies with:

pip3 install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu

If you plan on using GPUS use the reqirements-hpc.txt files instead.

Data Processing

To execute data processing, run the following module:

python -m src.clinical_longformer.data.processing

To get the help messagae use the --help argument:

$ python -m src.clinical_longformer.data.processing --help
usage: processing.py [-h] [--n-days {1-30}] [-v] [-vv] mimic_path {ds,all} {-1,512,1024,2048,4096} [out_path]

Data processing

positional arguments:
mimic_path            MIMIC-III dataset path
{ds,all}              set notes category (ds - Discharge Summary)
{-1,512,1024,2048,4096}
                        set note length, -1 means do not chunk text
out_path              set output path

optional arguments:
-h, --help            show this help message and exit
--n-days {1-30}       set number of days (only used if category is set to all)
-v, --verbose         set loglevel to INFO
-vv, --very-verbose   set loglevel to DEBUG

Training

Model training is done with PyTorch Lightning framework.

There are four excutable modules available in src/clinical_longformer/model: dan.py, lstm.py, bert.py and longformer.py. These modules run the Pytorch Lightning Trainer, you can find available arguments by using the --help argument. More information is available in the docs.

In the case of longformer.py we are able to specify the maximum token length usgin the --max_length argument.

In the hpc-uct, and chpc folders there are examples of how to run the models.

Tuning

Hyperparameter tuning is done using Weights & Biases. Look inside hpc-uct, and chpc for examples of how to run the sweeps.

Pre-training

Pre-training is done using the HuggingFace's Transformers library language-modeling example script.

The script has been cloned to this repository, where job files for executing in hpc-uct are available.

Name		Name	Last commit message	Last commit date
Latest commit History 310 Commits
chpc		chpc
docs		docs
hpc-uct		hpc-uct
src/clinical_longformer		src/clinical_longformer
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
AUTHORS.rst		AUTHORS.rst
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements-hpc.txt		requirements-hpc.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clinical-longformer

Description

Setup

Data Processing

Training

Tuning

Pre-training

About

Releases

Packages

Languages

License

ynurmahomed/clinical-longformer

Folders and files

Latest commit

History

Repository files navigation

clinical-longformer

Description

Setup

Data Processing

Training

Tuning

Pre-training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages