From BERT to Mamba: Evaluating Deep Learning for Efficient QA Systems

This repository contains the implementation and results of our project, "From BERT to Mamba: Evaluating Deep Learning for Efficient QA Systems", which explores the trade-offs between accuracy and computational efficiency in question-answering (QA) systems. Our work compares several deep learning models, including BERT, T5, LSTM, and Mamba, focusing on their performance on the SQuAD 2.0 dataset.

Introduction

Question Answering (QA) and Machine Reading Comprehension are fundamental NLP tasks that involve interpreting user queries and extracting relevant information from text. Our work compares the following models:

BERT: A transformer-based model with state-of-the-art performance but high computational costs.
T5: A versatile text-to-text transformer model.
LSTM: A traditional RNN-based approach, incorporating attention mechanisms for improved context understanding.
Mamba: A state-space model (SSM) designed for efficient processing of long sequences.

Through rigorous evaluation, we analyze the trade-offs in accuracy and computational efficiency, focusing on metrics such as Exact Match (EM) scores.

Models Evaluated

BERT Model

Architecture: Transformer-based with bidirectional attention.
Pretraining/Fine-tuning: Fine-tuned on the SQuAD 2.0 dataset.
Evaluation: Demonstrates high accuracy but requires significant memory.

T5 Model

Architecture: Text-to-text transformer model.
Pretraining/Fine-tuning: Fine-tuned for QA tasks with structured input and output. Used LoRA, Quantization and QLoRA for efficient fine-tuning.
Evaluation: Balances accuracy and versatility, with longer training times. More efficient in case of LoRA, Quantization and QLoRA.

LSTM-Based Model

Architecture: RNN with bidirectional LSTM layers and attention mechanisms.
Pretraining/Fine-tuning: Trained on a subset of SQuAD 2.0 due to resource constraints.
Evaluation: Low accuracy compared to transformer-based models.

Mamba Model

Architecture: State-space model (SSM) with linear scaling, optimized for long sequences.
Pretraining/Fine-tuning: Fine-tuned for QA with synthetic negative examples for unanswerable questions.
Evaluation: Efficient but limited in semantic reasoning.

Dataset

We use the SQuAD 2.0 dataset, a benchmark in QA tasks. It includes:

100,000 answerable questions with associated context.
50,000 unanswerable questions to test the model's ability to identify insufficient information.

Evaluation Metrics

We use the Exact Match (EM) score, a strict metric that calculates the percentage of predictions that exactly match the ground-truth answers, including punctuation and whitespace.

Results

BERT

Configuration	Model Size (Parameters)	Performance
BERT-Base	~110M	Accurate but large
BERT-Base + Static Model Pruning	~85M	Efficient fine-tuning
BERT-Base + Knowledge Distillation (KD)	~85M	Efficient with high accuracy

T5

Configuration	Validation Loss	Model Size	Performance
T5-base (FP32)	Baseline	~850 MB	Accurate but large
T5-base + LoRA	Slightly Worse	~900 MB	Efficient fine-tuning
T5-base + Quantization (INT8)	Slightly Worse	~212 MB	Efficient fine-tuning
T5-base + QLoRA	Slightly Worse	~300 MB	Memory-efficient

BERT and T5 outperform other models, while Mamba struggles with semantic reasoning, highlighting the limitations of its architecture for QA tasks.

References

Rajpurkar, P., Jia, R., & Liang, P. (2018). Know What You Don’t Know: Unanswerable Questions for SQuAD. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 784–789. [Paper Link]
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation. [Paper Link]
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. [Paper Link]
Raffel, C., Shazeer, N., Roberts, A., et al. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research. [Paper Link]
Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. [Paper Link]
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models [Paper Link]
Aishwarya Bhandare, Vamsi Sripathi, Deepthi Karkada, Vivek Menon, Sun Choi, Kushal Datta, Vikram Saletore. Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model [Paper Link]
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer. QLoRA: Efficient Finetuning of Quantized LLMs [Paper Link)]

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
BERT		BERT
LSTM		LSTM
T5		T5
mamba		mamba
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From BERT to Mamba: Evaluating Deep Learning for Efficient QA Systems

Table of Contents

Introduction

Models Evaluated

BERT Model

T5 Model

LSTM-Based Model

Mamba Model

Dataset

Evaluation Metrics

Results

BERT

T5

References

About

Releases

Packages

Contributors 4

Languages

eliguo/DS-GA-1011-Group-Project

Folders and files

Latest commit

History

Repository files navigation

From BERT to Mamba: Evaluating Deep Learning for Efficient QA Systems

Table of Contents

Introduction

Models Evaluated

BERT Model

T5 Model

LSTM-Based Model

Mamba Model

Dataset

Evaluation Metrics

Results

BERT

T5

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages