Text Summarisation using BERT and T5 on the Wikihow Dataset.

Transformer models have grown in popularity in recent years. This is primarily due to the self-attention mechanism in their architecture that gives differential weights to the significant portion of the text. This feature also allows parallelization compared to traditional methods, thereby reducing training time. Although all transformer based approaches are really good, there is still a challenge to decide which transformer model will perform better with a new dataset. In this paper, we propose a few experiments for text summarization using the WikiHow dataset. We use two models in a contest to outperform the other- BERT-large (extractive text summarizer) and T5-small (abstractive text summarizer). We conducted experiments on various text lengths and compared them using ROUGE scores. Following this, we conducted an experiment to determine which of the two models would yield better results in terms of information retrieval.

We use the WikiHow Text Summarization dataset:
Dataset: https://github.com/mahnazkoupaee/WikiHow-Dataset
Paper: https://arxiv.org/abs/1810.09305

The project Collaborators are as follows:

Anjali Pal (anjali.pal.21@ucl.ac.uk)
Langlang Fan (langlang.fan.21@ucl.ac.uk)
Kanupriya (kanupriya.21@ucl.ac.uk)
Vanessa Igodifo (vanessa.igodifo.21@ucl.ac.uk)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Text Summarisation using BERT and T5 on the Wikihow Dataset.

This research is produced as a part of coursework for Statistical Natural Language Processing (COMP0087).

Files

README.md

Latest commit

History

README.md

File metadata and controls

Text Summarisation using BERT and T5 on the Wikihow Dataset.

This research is produced as a part of coursework for Statistical Natural Language Processing (COMP0087).