Skip to content

This research is produced as a part of coursework for Statistical Natural Language Processing

License

Notifications You must be signed in to change notification settings

Anjali001/Text_Summarisation_SNLP

Repository files navigation

Text Summarisation using BERT and T5 on the Wikihow Dataset.

Transformer models have grown in popularity in recent years. This is primarily due to the self-attention mechanism in their architecture that gives differential weights to the significant portion of the text. This feature also allows parallelization compared to traditional methods, thereby reducing training time. Although all transformer based approaches are really good, there is still a challenge to decide which transformer model will perform better with a new dataset. In this paper, we propose a few experiments for text summarization using the WikiHow dataset. We use two models in a contest to outperform the other- BERT-large (extractive text summarizer) and T5-small (abstractive text summarizer). We conducted experiments on various text lengths and compared them using ROUGE scores. Following this, we conducted an experiment to determine which of the two models would yield better results in terms of information retrieval.

We use the WikiHow Text Summarization dataset:
Dataset: https://github.com/mahnazkoupaee/WikiHow-Dataset
Paper: https://arxiv.org/abs/1810.09305

The project Collaborators are as follows:

This research is produced as a part of coursework for Statistical Natural Language Processing (COMP0087).

About

This research is produced as a part of coursework for Statistical Natural Language Processing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •