Skip to content

Latest commit

 

History

History
14 lines (11 loc) · 1.45 KB

README.md

File metadata and controls

14 lines (11 loc) · 1.45 KB

Text Summarisation using BERT and T5 on the Wikihow Dataset.

Transformer models have grown in popularity in recent years. This is primarily due to the self-attention mechanism in their architecture that gives differential weights to the significant portion of the text. This feature also allows parallelization compared to traditional methods, thereby reducing training time. Although all transformer based approaches are really good, there is still a challenge to decide which transformer model will perform better with a new dataset. In this paper, we propose a few experiments for text summarization using the WikiHow dataset. We use two models in a contest to outperform the other- BERT-large (extractive text summarizer) and T5-small (abstractive text summarizer). We conducted experiments on various text lengths and compared them using ROUGE scores. Following this, we conducted an experiment to determine which of the two models would yield better results in terms of information retrieval.

We use the WikiHow Text Summarization dataset:
Dataset: https://github.com/mahnazkoupaee/WikiHow-Dataset
Paper: https://arxiv.org/abs/1810.09305

The project Collaborators are as follows:

This research is produced as a part of coursework for Statistical Natural Language Processing (COMP0087).