MambaFormer in-context learning experiments and implementation from https://arxiv.org/abs/2402.04248
You can start by cloning our repository and following the steps below.
-
Install the dependencies for our code using Conda.
conda env create -f environment.yml conda activate mambaformer pip install -r requirements.txt
-
Then install mamba-ssm from our repo and the necessary requirements such as causal-conv1d. We use mamba-ssm v1.1.1 modified.
pip install causal-conv1d==1.1.3.post1 cd mamba/ pip install -e . cd ../src
To train your own model, run the following command in the src/
directory.
# For standard 1GPU training
python train.py --config conf/linear_regression.yaml
# For data parallel training (only needed for many-outlier regression)
CUDA_VISIBLE_DEVICES=0,1 python train_parallel.py --config conf/many_outlier_regression.yaml
In our code, we consider each Attention or Mamba block as 1 layer. So Mamba or MambaFormer with n_layer: 24
is roughly equivalent to gpt2 with n_layer: 12
in total parameters. We include sample linear_regression.yaml
files for each architecture in src/conf
.
The code is released under the Apache-2.0 License. See LICENSE
for full terms.
The generated data is subject to the model owner's policy.
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
@article{park2024mambaformer,
title={Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks},
author={Park, Jongho and Park, Jaeseung and Xiong, Zheyang and Lee, Nayoung and Cho, Jaewoong and Oymak, Samet and Lee, Kangwook and Papailiopoulos, Dimitris},
journal={International Conference on Machine Learning},
year={2024}
}