This repo hosts pretraining and finetuning weights and relevant scripts for Clinical XLNet.
torch
argparse
copy
tqdm
matplotlib
numpy
pandas
time
sklearn
To download pretrained Clinical XLNet, click the following links: This only uses Nursing Notes to pretrain and this uses the discharge summary to pretrain.
Below list the sample scripts for running prediction. You can also simply modify the label to do your own downstream prediction task. This is the finetuned weights for PMV task, and this is the finetuned weights for Mortality task.
python train.py \
--data_dir DATA_FILE\
--config_path CONFIG\
--model_path MORTALITY/PMV_MODEL_PATH \
--save_meta_finetune_path SAVE_PATH \
--prediction_label Mortality/PMV \
--Batch_Size_Meta 4 \
--Learning_Rate_Meta 1e-5 \
--Training_Epoch_Meta 4 \
--Batch_Size_Finetune 128 \
--Learning_Rate_Finetune 2e-5 \
--Training_Epoch_Finetune 30 \
--saving_notes_embed_batch_size 32 \
--skip_meta_finetuned
python train.py \
--data_dir DATA_FILE\
--config_path CONFIG\
--model_path PRETRAIN_MODEL_PATH \
--save_meta_finetune_path SAVE_PATH \
--prediction_label Mortality/PMV \
--Batch_Size_Meta 4 \
--Learning_Rate_Meta 1e-5 \
--Training_Epoch_Meta 4 \
--Batch_Size_Finetune 128 \
--Learning_Rate_Finetune 2e-5 \
--Training_Epoch_Finetune 30 \
--saving_notes_embed_batch_size 32
It will use the train.csv, val.csv, and test.csv from the (DATA_FILE) folder.
The results of AUROC and AUPRC will be printed out.
We use MIMIC-III. Please fufill the CITI training program in order to use it. To use your own notes dataset, further pretraining is recommended.
File system expected:
-data
-train.csv
-val.csv
-test.csv
We provide a notebook tutorial to pretrain your own Clinical XLNet.
We provide notebook for preprocessing clinical notes and curate the PMV cohort on MIMIC-III. It consists of two parts, R script generates the general mechanical ventilation cohort and this notebook generates the specific cohort, see papers for detailed cohort curation process.
Please contact charlotta_lindvall@dfci.harvard.edu for help or submit an issue.
Please cite arxiv:
@article{clinicalxlnet,
author = {Kexin Huang and Abhishek Singh and Sitong Chen and Edward Moseley and Chin-ying Deng and Naomi George and Charlotta Lindvall},
title = {Clinical XLNet: Modeling Sequential Clinical Notes and Predicting Prolonged Mechanical Ventilation},
year = {2019},
journal = {arXiv:1912.11975},
}