Skim-Lit

This is an end-to-end tensorflow project that is based on the concepts of deep learning and natural language processing. Here we're going to be replicating the deep learning model behind the 2017 paper PubMed 200k RCT: a Dataset for Sequenctial Sentence Classification in Medical Abstracts. When it was released, the paper presented a new dataset called PubMed 200k RCT which consists of ~200,000 labelled Randomized Controlled Trial (RCT) abstracts. The goal of the dataset was to explore the ability for NLP models to classify sentences which appear in sequential order.

What we are going to cover (broadly):

Downloading a text dataset (PubMed RCT200k from GitHub)
Writing a preprocessing function to prepare our data for modelling
Setting up a series of modelling experiments
Making a baseline (TF-IDF classifier)
Deep models with different combinations of: token embeddings, character embeddings, pretrained embeddings, positional embeddings
Building our first multimodal model (taking multiple types of data inputs)
Replicating the model architecture from https://arxiv.org/abs/1612.05251
Find the most wrong predictions
Making predictions on PubMed abstracts from the wild

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Neural-Network-for-joint-sentence-classification.pdf		Neural-Network-for-joint-sentence-classification.pdf
PubMed_200k_RCT.pdf		PubMed_200k_RCT.pdf
README.md		README.md
TensorFlow_Milestone_Project_2_SkimLit_NLP.ipynb		TensorFlow_Milestone_Project_2_SkimLit_NLP.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skim-Lit

What we are going to cover (broadly):

About

Releases

Packages

Languages

Gyani-rocks/Skim-Lit

Folders and files

Latest commit

History

Repository files navigation

Skim-Lit

What we are going to cover (broadly):

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages