We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
尊敬的作者您好,我按照库中的配置,将per_device_train_batch_size和per_device_eval_batch_size都设置为1,发现在单卡16GB的V100上运行lomo_lora_trainer.py训练LlaMA-7B会出现OOM的问题。
per_device_train_batch_size
per_device_eval_batch_size
lomo_lora_trainer.py
具体配置如下
# model model_name_or_path: 'openlm-research/open_llama_7b' # data dataset_name: 'wic' refresh: false data_tag: 'base' train_on_inputs: false data_max_length: 1024 # training # trainer peft_type: 'lora' lora_only: false hf_learning_rate: 0.0005 hf_weight_decay: 0 hf_lr_scheduler_type: 'linear' hf_warmup: 0.05 tag: 'lora-qv-r2-lomo' output_dir: 'outputs' overwrite_output_dir: true deepspeed: 'config/ds_config_lora.json' do_train: true do_eval: true evaluation_strategy: 'epoch' per_device_train_batch_size: 1 per_device_eval_batch_size: 1 learning_rate: 0.005 weight_decay: 0 num_train_epochs: 10 lr_scheduler_type: 'linear' warmup: 0.05 clip_grad_norm: 1.0 #clip_grad_value: 1.0 #clip_loss_value: 5.0 log_level: 'info' logging_steps: 1 # please set `resume_from_checkpoint` to load checkpoints. check `merge_llama_with_lora.py` first. #resume_from_checkpoint: 'outputs/wic_7B_lora-qv-r2-lomo/output_lr0.005_bs16_warmup0.05_clipnorm1.0/checkpoint-0/merge_weights' # please set `save_strategy` (`no`, `epoch`, `steps`) and `save_total_limit` (the max amount of checkpoints) to save checkpoints. save_strategy: 'no' save_total_limit: 0 seed: 42 #bf16: true remove_unused_columns: false load_best_model_at_end: false metric_for_best_model: 'acc' optim: 'sgd' group_by_length: false #report_to: 'wandb' dataloader_pin_memory: false gradient_checkpointing: true predict_with_generate: false lora_r: 2
顺便说一下,我在按照上述同样的配置,不用lora的情况下,在16GB的V100上通过LOMO训练LlaMA-7B将占用15933MB的显存,和论文中的结果似乎不太一样。请问是哪里我配置得不对吗?
The text was updated successfully, but these errors were encountered:
你好,我在测试lomo+lora显存的时候使用的是3090,有24GB显存,一张卡就可以。V100可能需要两张。 论文里的显存是使用torch.cuda.memory_reserved()测试的,会比使用nvidia-smi等来监测少一点,是正常的。
torch.cuda.memory_reserved()
nvidia-smi
Sorry, something went wrong.
非常感谢您的解答。 我看到您的论文的Table 2中提到,一个7B的模型在单卡3090上通过LOMO训练,占用显存13.61GB。感觉如果加上LoRA(r=2)不应该在16GB的卡上就直接OOM了,能否跟您请教下这个情况是否是正常的?
No branches or pull requests
尊敬的作者您好,我按照库中的配置,将
per_device_train_batch_size
和per_device_eval_batch_size
都设置为1,发现在单卡16GB的V100上运行lomo_lora_trainer.py
训练LlaMA-7B会出现OOM的问题。具体配置如下
顺便说一下,我在按照上述同样的配置,不用lora的情况下,在16GB的V100上通过LOMO训练LlaMA-7B将占用15933MB的显存,和论文中的结果似乎不太一样。请问是哪里我配置得不对吗?
The text was updated successfully, but these errors were encountered: