Natural Language Processing For Healthcare

Dataset

The notebooks are supposed to work on the Pubmed_200k_RCT dataset.

Setup

The notebooks are supposed to run on Google Colab (it is possible to run it locally by changing the file paths and removing the drive mount for Google drive.

Packages

It may be required to install the following libraries to make the notebooks work:

$ pip install yellowbrick
$ pip install gensim
$ pip install seaborn
$ pip install transformers
$ pip install datasets

Using preprocessed data and trained models

Because of the size of the trained models and preprocessed data, these file can be found here.

Preprocessing

The motebook Task1.ipynb requires to have the preprocessed dataset. To use this either use the preprocessed data mentioned in the above mentioned folder or run the notebook Task1_preprocessing.ipynb. The preprocessed files should be in the path indicated by data_path. For the notebook Task2.ipynb either set is_preprocess_enabled=True to let the notebook do the preprocessing or set it to False and use the preprocessed data mentioned in the above mentioned folder. The preprocessed files should be in the path indicated by data_path.

Train

In the first cells of each notebook you can find the following line:

train = <True|False>

This indiates that the model will be loaded. To train the model simply change the previous cell to:

train = True

File paths

In the first cells of each notebook you can find 2 paths:

data_path = path/to/data/
model_path = path/to/saved/models/

The first path indicates the directory of the datasets, the second path indicates the file from which the model will be loaded (or saved if training is enabled): data_path variable should point to the folder which contains the original dataset and the preprocessed files, model_path variable should point to the folder which contains the model files.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
models		models
src		src
visualizations		visualizations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ReportNLPForMedicalRecords.pdf		ReportNLPForMedicalRecords.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Processing For Healthcare

Dataset

Setup

Packages

Using preprocessed data and trained models

Preprocessing

Train

File paths

Contributors

About

Releases

Packages

Languages

License

ravifrancesco/Natural_Language_Processing_For_Healthcare

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing For Healthcare

Dataset

Setup

Packages

Using preprocessed data and trained models

Preprocessing

Train

File paths

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages