Megha-Bose / Disease-NER Public

Notifications You must be signed in to change notification settings
Fork 1
Star 8

Given a medical diagnosis, identifying medical conditions within the text through named entity linking and mapping them to standardized medical encodings using BERT based models. Task: https://temu.bsc.es/distemist/

8 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
.gitignore		.gitignore
EL.ipynb		EL.ipynb
EL_pubmedbert.ipynb		EL_pubmedbert.ipynb
EL_roberta.ipynb		EL_roberta.ipynb
Entities_NER.ipynb		Entities_NER.ipynb
Pre-processing.ipynb		Pre-processing.ipynb
README.md		README.md

Repository files navigation

Disease-NER

Given a medical diagnosis, identifying medical conditions within the text and mapping them to standardized medical encodings.

Data

The data directory contains:

The disease mentions from the text files stored in entities.tsv.
Text files containing the medical textual data in the text directory.

The data is taken from the English version of multilingual resources of the DisTEMIST 2022 task: https://zenodo.org/record/6532684

Pre-processing

The pre-processing stage involves:

Splitting medical text in each file into sentences.
Tokenizing the sentences into words/tokens.
Calculating IOB tags for the tokens for named entity recognition (NER) task.
Code: Pre-processing.ipynb

NER Task

Two Types of Models are built:
- The entire clinical case / document is given as input
- Sentence based Tokenization and the sentences are given as input
The basic models used are :
- https://huggingface.co/d4data/biomedical-ner-all
- https://huggingface.co/pucpr/clinicalnerpt-medical
Disease mentions identification is built as a Token classification problem.
Code: Entities_NER.ipynb

Entity Linking Task

The disease mentions are linked to SNOMED CT codes.
The models used are:
- SapBERT: https://huggingface.co/cambridgeltl/SapBERT-from-PubMedBERT-fulltext
- Roberta-Large: https://huggingface.co/raynardj/pmc-med-bio-mlm-roberta-large
- PubMedBERT: https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract
Code: EL.ipynb (SapBERT), EL_roberta.ipynb (Roberta-Large), EL_pubmedbert.ipynb (PubMedBERT)

About

Given a medical diagnosis, identifying medical conditions within the text through named entity linking and mapping them to standardized medical encodings using BERT based models. Task: https://temu.bsc.es/distemist/

nlp named-entity-recognition ner entity-linking

Report repository

Releases

No releases published

Packages

No packages published

Contributors 3

Languages

Jupyter Notebook 100.0%