ClinicalBERT

This repo hosts pretraining and finetuning weights and relevant scripts for ClinicalBERT, a contextual representation for clinical notes. The corresponding publication is https://arxiv.org/abs/1904.05342.

New: Clinical XLNet and Pretraining Script

clinical XLNet pretrained model is available at here.
Detailed Step Instructions for pretraining ClinicalBERT and Clinical XLNet from scratch are available here
The predictive performance result is updated using the correct pretraining test splitting method described in pretraining script above. For more clinical outcomes performance comparison with more baselines using the correct split for ClinicalBERT/XLNet, please see the Clinical XLNet paper.

Installation and Requirements

pip install pytorch-pretrained-bert

Datasets

We use MIMIC-III. As MIMIC-III requires the CITI training program in order to use it, we refer users to the link. However, as clinical notes share commonality, users can test any clinical notes using the ClinicalBERT weight, although further fine-tuning from our checkpoint is recommended.

File system expected:

-data
  -discharge
    -train.csv
    -val.csv
    -test.csv
  -3days
    -train.csv
    -val.csv
    -test.csv
  -2days
    -test.csv

Data file is expected to have column "TEXT", "ID" and "Label" (Note chunks, Admission ID, Label of readmission).

ClinicalBERT Weights

Use this google link to download pretrained ClinicalBERT along with the readmission task fine-tuned model weights.

The following scripts presume a model folder that has following structure:

-model
	-discharge_readmission
		-bert_config.json
		-pytorch_model.bin
	-early_readmission
		-bert_config.json
		-pytorch_model.bin
	-pretraining
		-bert_config.json
		-pytorch_model.bin
		-vocab.txt

Hospital Readmission using ClinicalBERT

Below list the scripts for running prediction for 30 days hospital readmissions.

Early Notes Prediction

python ./run_readmission.py \
  --task_name readmission \
  --readmission_mode early \
  --do_eval \
  --data_dir ./data/3days(2days)/ \
  --bert_model ./model/early_readmission \
  --max_seq_length 512 \
  --output_dir ./result_early

Discharge Summary Prediction

python ./run_readmission.py \
  --task_name readmission \
  --readmission_mode discharge \
  --do_eval \
  --data_dir ./data/discharge/ \
  --bert_model ./model/discharge_readmission \
  --max_seq_length 512 \
  --output_dir ./result_discharge

Training your own readmission prediction model from pretraining ClinicalBERT

python ./run_readmission.py \
  --task_name readmission \
  --do_train \
  --do_eval \
  --data_dir ./data/(DATA_FILE) \
  --bert_model ./model/pretraining \
  --max_seq_length 512 \
  --train_batch_size (BATCH_SIZE) \
  --learning_rate 2e-5 \
  --num_train_epochs (EPOCHs) \
  --output_dir ./result_new

It will use the train.csv from the (DATA_FILE) folder.

The results will be in the output_dir folder and it consists of

'logits_clinicalbert.csv': logits from ClinicalBERT to compare with other models
'auprc_clinicalbert.png': Precision-Recall Curve
'auroc_clinicalbert.png': ROC Curve
'eval_results.txt': RP80, accuracy, loss

Preprocessing

We provide script for preprocessing clinical notes and merge notes with admission information on MIMIC-III.

Notebooks

Attention: this notebook is a tutorial to visualize self-attention.

Gensim Word2Vec and FastText models

Please use this link to download Word2Vec and FastText models for Clinical Notes.

To use, simply

import gensim
word2vec = gensim.models.KeyedVectors.load('word2vec.model')
weights = (m[m.wv.vocab])

Citation

Please cite arxiv:

@article{clinicalbert,
author = {Kexin Huang and Jaan Altosaar and Rajesh Ranganath},
title = {ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission},
year = {2019},
journal = {arXiv:1904.05342},
}

Ownership

This project is released by All Bets, LLC, a wholly-owned subsidiary of the One Fact Foundation, a 501(c)(3) nonprofit whose purpose is to change global health care using open source principles.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
notebooks		notebooks
training		training
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
conda_env.yaml		conda_env.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClinicalBERT

New: Clinical XLNet and Pretraining Script

Installation and Requirements

Datasets

ClinicalBERT Weights

Hospital Readmission using ClinicalBERT

Early Notes Prediction

Discharge Summary Prediction

Training your own readmission prediction model from pretraining ClinicalBERT

Preprocessing

Notebooks

Gensim Word2Vec and FastText models

Citation

Ownership

About

Releases

Packages

Contributors 2

Languages

License

onefact/ClinicalBERT

Folders and files

Latest commit

History

Repository files navigation

ClinicalBERT

New: Clinical XLNet and Pretraining Script

Installation and Requirements

Datasets

ClinicalBERT Weights

Hospital Readmission using ClinicalBERT

Early Notes Prediction

Discharge Summary Prediction

Training your own readmission prediction model from pretraining ClinicalBERT

Preprocessing

Notebooks

Gensim Word2Vec and FastText models

Citation

Ownership

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages