Skip to content

eliguo/DS-GA-1011-Group-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

From BERT to Mamba: Evaluating Deep Learning for Efficient QA Systems

This repository contains the implementation and results of our project, "From BERT to Mamba: Evaluating Deep Learning for Efficient QA Systems", which explores the trade-offs between accuracy and computational efficiency in question-answering (QA) systems. Our work compares several deep learning models, including BERT, T5, LSTM, and Mamba, focusing on their performance on the SQuAD 2.0 dataset.

Table of Contents


Introduction

Question Answering (QA) and Machine Reading Comprehension are fundamental NLP tasks that involve interpreting user queries and extracting relevant information from text. Our work compares the following models:

  1. BERT: A transformer-based model with state-of-the-art performance but high computational costs.
  2. T5: A versatile text-to-text transformer model.
  3. LSTM: A traditional RNN-based approach, incorporating attention mechanisms for improved context understanding.
  4. Mamba: A state-space model (SSM) designed for efficient processing of long sequences.

Through rigorous evaluation, we analyze the trade-offs in accuracy and computational efficiency, focusing on metrics such as Exact Match (EM) scores.


Models Evaluated

BERT Model

  • Architecture: Transformer-based with bidirectional attention.
  • Pretraining/Fine-tuning: Fine-tuned on the SQuAD 2.0 dataset.
  • Evaluation: Demonstrates high accuracy but requires significant memory.

T5 Model

  • Architecture: Text-to-text transformer model.
  • Pretraining/Fine-tuning: Fine-tuned for QA tasks with structured input and output. Used LoRA, Quantization and QLoRA for efficient fine-tuning.
  • Evaluation: Balances accuracy and versatility, with longer training times. More efficient in case of LoRA, Quantization and QLoRA.

LSTM-Based Model

  • Architecture: RNN with bidirectional LSTM layers and attention mechanisms.
  • Pretraining/Fine-tuning: Trained on a subset of SQuAD 2.0 due to resource constraints.
  • Evaluation: Low accuracy compared to transformer-based models.

Mamba Model

  • Architecture: State-space model (SSM) with linear scaling, optimized for long sequences.
  • Pretraining/Fine-tuning: Fine-tuned for QA with synthetic negative examples for unanswerable questions.
  • Evaluation: Efficient but limited in semantic reasoning.

Dataset

We use the SQuAD 2.0 dataset, a benchmark in QA tasks. It includes:

  • 100,000 answerable questions with associated context.
  • 50,000 unanswerable questions to test the model's ability to identify insufficient information.

Evaluation Metrics

We use the Exact Match (EM) score, a strict metric that calculates the percentage of predictions that exactly match the ground-truth answers, including punctuation and whitespace.


Results

BERT

Configuration Model Size (Parameters) Performance
BERT-Base ~110M Accurate but large
BERT-Base + Static Model Pruning ~85M Efficient fine-tuning
BERT-Base + Knowledge Distillation (KD) ~85M Efficient with high accuracy

T5

Configuration Validation Loss Model Size Performance
T5-base (FP32) Baseline ~850 MB Accurate but large
T5-base + LoRA Slightly Worse ~900 MB Efficient fine-tuning
T5-base + Quantization (INT8) Slightly Worse ~212 MB Efficient fine-tuning
T5-base + QLoRA Slightly Worse ~300 MB Memory-efficient

BERT and T5 outperform other models, while Mamba struggles with semantic reasoning, highlighting the limitations of its architecture for QA tasks.


References

  1. Rajpurkar, P., Jia, R., & Liang, P. (2018). Know What You Don’t Know: Unanswerable Questions for SQuAD. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 784–789. [Paper Link]
  2. Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation. [Paper Link]
  3. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. [Paper Link]
  4. Raffel, C., Shazeer, N., Roberts, A., et al. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research. [Paper Link]
  5. Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. [Paper Link]
  6. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models [Paper Link]
  7. Aishwarya Bhandare, Vamsi Sripathi, Deepthi Karkada, Vivek Menon, Sun Choi, Kushal Datta, Vikram Saletore. Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model [Paper Link]
  8. Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer. QLoRA: Efficient Finetuning of Quantized LLMs [Paper Link)]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •