This repository contains the implementation and results of our project, "From BERT to Mamba: Evaluating Deep Learning for Efficient QA Systems", which explores the trade-offs between accuracy and computational efficiency in question-answering (QA) systems. Our work compares several deep learning models, including BERT, T5, LSTM, and Mamba, focusing on their performance on the SQuAD 2.0 dataset.
Question Answering (QA) and Machine Reading Comprehension are fundamental NLP tasks that involve interpreting user queries and extracting relevant information from text. Our work compares the following models:
- BERT: A transformer-based model with state-of-the-art performance but high computational costs.
- T5: A versatile text-to-text transformer model.
- LSTM: A traditional RNN-based approach, incorporating attention mechanisms for improved context understanding.
- Mamba: A state-space model (SSM) designed for efficient processing of long sequences.
Through rigorous evaluation, we analyze the trade-offs in accuracy and computational efficiency, focusing on metrics such as Exact Match (EM) scores.
- Architecture: Transformer-based with bidirectional attention.
- Pretraining/Fine-tuning: Fine-tuned on the SQuAD 2.0 dataset.
- Evaluation: Demonstrates high accuracy but requires significant memory.
- Architecture: Text-to-text transformer model.
- Pretraining/Fine-tuning: Fine-tuned for QA tasks with structured input and output. Used LoRA, Quantization and QLoRA for efficient fine-tuning.
- Evaluation: Balances accuracy and versatility, with longer training times. More efficient in case of LoRA, Quantization and QLoRA.
- Architecture: RNN with bidirectional LSTM layers and attention mechanisms.
- Pretraining/Fine-tuning: Trained on a subset of SQuAD 2.0 due to resource constraints.
- Evaluation: Low accuracy compared to transformer-based models.
- Architecture: State-space model (SSM) with linear scaling, optimized for long sequences.
- Pretraining/Fine-tuning: Fine-tuned for QA with synthetic negative examples for unanswerable questions.
- Evaluation: Efficient but limited in semantic reasoning.
We use the SQuAD 2.0 dataset, a benchmark in QA tasks. It includes:
- 100,000 answerable questions with associated context.
- 50,000 unanswerable questions to test the model's ability to identify insufficient information.
We use the Exact Match (EM) score, a strict metric that calculates the percentage of predictions that exactly match the ground-truth answers, including punctuation and whitespace.
Configuration | Model Size (Parameters) | Performance |
---|---|---|
BERT-Base | ~110M | Accurate but large |
BERT-Base + Static Model Pruning | ~85M | Efficient fine-tuning |
BERT-Base + Knowledge Distillation (KD) | ~85M | Efficient with high accuracy |
Configuration | Validation Loss | Model Size | Performance |
---|---|---|---|
T5-base (FP32) | Baseline | ~850 MB | Accurate but large |
T5-base + LoRA | Slightly Worse | ~900 MB | Efficient fine-tuning |
T5-base + Quantization (INT8) | Slightly Worse | ~212 MB | Efficient fine-tuning |
T5-base + QLoRA | Slightly Worse | ~300 MB | Memory-efficient |
BERT and T5 outperform other models, while Mamba struggles with semantic reasoning, highlighting the limitations of its architecture for QA tasks.
- Rajpurkar, P., Jia, R., & Liang, P. (2018). Know What You Don’t Know: Unanswerable Questions for SQuAD. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 784–789. [Paper Link]
- Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation. [Paper Link]
- Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. [Paper Link]
- Raffel, C., Shazeer, N., Roberts, A., et al. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research. [Paper Link]
- Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. [Paper Link]
- Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen. LoRA: Low-Rank Adaptation of Large Language Models [Paper Link]
- Aishwarya Bhandare, Vamsi Sripathi, Deepthi Karkada, Vivek Menon, Sun Choi, Kushal Datta, Vikram Saletore. Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model [Paper Link]
- Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer. QLoRA: Efficient Finetuning of Quantized LLMs [Paper Link)]