LlaMA-7B + LoRA在16GB的V100上OOM #53

zhenqincn · 2023-08-30T02:49:50Z

尊敬的作者您好，我按照库中的配置，将per_device_train_batch_size和per_device_eval_batch_size都设置为1，发现在单卡16GB的V100上运行lomo_lora_trainer.py训练LlaMA-7B会出现OOM的问题。

具体配置如下

# model
model_name_or_path: 'openlm-research/open_llama_7b'
# data
dataset_name: 'wic'
refresh: false
data_tag: 'base'
train_on_inputs: false
data_max_length: 1024
# training
# trainer
peft_type: 'lora'
lora_only: false
hf_learning_rate: 0.0005
hf_weight_decay: 0
hf_lr_scheduler_type: 'linear'
hf_warmup: 0.05
tag: 'lora-qv-r2-lomo'
output_dir: 'outputs'
overwrite_output_dir: true
deepspeed: 'config/ds_config_lora.json'
do_train: true
do_eval: true
evaluation_strategy: 'epoch'
per_device_train_batch_size: 1
per_device_eval_batch_size: 1
learning_rate: 0.005
weight_decay: 0
num_train_epochs: 10
lr_scheduler_type: 'linear'
warmup: 0.05
clip_grad_norm: 1.0
#clip_grad_value: 1.0
#clip_loss_value: 5.0
log_level: 'info'
logging_steps: 1
# please set `resume_from_checkpoint` to load checkpoints. check `merge_llama_with_lora.py` first.
#resume_from_checkpoint: 'outputs/wic_7B_lora-qv-r2-lomo/output_lr0.005_bs16_warmup0.05_clipnorm1.0/checkpoint-0/merge_weights'
# please set `save_strategy` (`no`, `epoch`, `steps`) and `save_total_limit` (the max amount of checkpoints) to save checkpoints.
save_strategy: 'no'
save_total_limit: 0
seed: 42
#bf16: true
remove_unused_columns: false
load_best_model_at_end: false
metric_for_best_model: 'acc'
optim: 'sgd'
group_by_length: false
#report_to: 'wandb'
dataloader_pin_memory: false
gradient_checkpointing: true
predict_with_generate: false
lora_r: 2

顺便说一下，我在按照上述同样的配置，不用lora的情况下，在16GB的V100上通过LOMO训练LlaMA-7B将占用15933MB的显存，和论文中的结果似乎不太一样。请问是哪里我配置得不对吗？

The text was updated successfully, but these errors were encountered:

KaiLv69 · 2023-08-31T13:13:07Z

你好，我在测试lomo+lora显存的时候使用的是3090，有24GB显存，一张卡就可以。V100可能需要两张。
论文里的显存是使用torch.cuda.memory_reserved()测试的，会比使用nvidia-smi等来监测少一点，是正常的。

zhenqincn · 2023-09-01T08:25:26Z

非常感谢您的解答。
我看到您的论文的Table 2中提到，一个7B的模型在单卡3090上通过LOMO训练，占用显存13.61GB。感觉如果加上LoRA(r=2)不应该在16GB的卡上就直接OOM了，能否跟您请教下这个情况是否是正常的？

zhenqincn changed the title ~~LlaMA-7B在16GB的V100上OOM~~ LlaMA-7B + LoRA在16GB的V100上OOM Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlaMA-7B + LoRA在16GB的V100上OOM #53

LlaMA-7B + LoRA在16GB的V100上OOM #53

zhenqincn commented Aug 30, 2023

KaiLv69 commented Aug 31, 2023

zhenqincn commented Sep 1, 2023

LlaMA-7B + LoRA在16GB的V100上OOM #53

LlaMA-7B + LoRA在16GB的V100上OOM #53

Comments

zhenqincn commented Aug 30, 2023

KaiLv69 commented Aug 31, 2023

zhenqincn commented Sep 1, 2023