You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ChatGLM2-6b模型合并时报错ValueError: We need an offload_dir to dispatch this model according to this device_map, the following submodules need to be offloaded
#50
(venv) PS C:\MyFiles\AI\model\chatglm2> python merge_lora_and_quantize.py --lora_path saved_files/chatGLM_6B_QLoRA_t32 --output_path /tmp/merged_qlora_model_4bit --remote_scripts_dir remote_scripts/chatglm2-6b --qbits 4
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.10s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Traceback (most recent call last):
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 80, in<module>
main(lora_path=args.lora_path,
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 54, in main
merged_model, lora_config = merge_lora(lora_path, device_map)
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 28, in merge_lora
model = PeftModel.from_pretrained(base_model, lora_path, device_map=device_map)
File "C:\MyFiles\AI\model\chatglm2\venv\lib\site-packages\peft\peft_model.py", line 181, in from_pretrained
model.load_adapter(model_id, adapter_name, **kwargs)
File "C:\MyFiles\AI\model\chatglm2\venv\lib\site-packages\peft\peft_model.py", line 406, in load_adapter
dispatch_model(
File "C:\MyFiles\AI\model\chatglm2\venv\lib\site-packages\accelerate\big_modeling.py", line 374, in dispatch_model
raise ValueError(
ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`, the following submodules need to be offloaded: base_model.model.transformer.encoder.layers.12, base_model.model.transformer.encoder.layers.13, base_model.model.transformer.encoder.layers.14, base_model.model.transformer.encoder.layers.15, base_model.model.transformer.encoder.layers.16, base_model.model.transformer.encoder.layers.17, base_model.model.transformer.encoder.layers.18, base_model.model.transformer.encoder.layers.19, base_model.model.transformer.encoder.layers.20, base_model.model.transformer.encoder.layers.21, base_model.model.transformer.encoder.layers.22, base_model.model.transformer.encoder.layers.23, base_model.model.transformer.encoder.layers.24, base_model.model.transformer.encoder.layers.25, base_model.model.transformer.encoder.layers.26, base_model.model.transformer.encoder.layers.27, base_model.model.transformer.encoder.final_layernorm, base_model.model.transformer.output_layer.
(venv) PS C:\MyFiles\AI\model\chatglm2>
(venv) PS C:\MyFiles\AI\model\chatglm2> python merge_lora_and_quantize.py --lora_path saved_files/chatGLM_6B_QLoRA_t32 --output_path /tmp/merged_qlora_model_4bit --remote_scripts_dir remote_scripts/chatglm2-6b --qbits 4
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.07s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk and cpu.
Traceback (most recent call last):
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 80, in <module>
main(lora_path=args.lora_path,
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 56, in main
quantized_model = quantize(merged_model, qbits)
File "C:\MyFiles\AI\model\chatglm2\merge_lora_and_quantize.py", line 35, in quantize
qmodel = model.quantize(qbits).half().cuda()
File "C:\Users\71977\.cache\huggingface\modules\transformers_modules\chatglm2-6b\modeling_chatglm.py", line 1197, in quantize
self.transformer.encoder = quantize(self.transformer.encoder, bits, empty_init=empty_init, device=device,
File "C:\Users\71977\.cache\huggingface\modules\transformers_modules\chatglm2-6b\quantization.py", line 157, in quantize
weight=layer.self_attention.query_key_value.weight.to(torch.cuda.current_device()),
NotImplementedError: Cannot copy out of meta tensor; no data!
The text was updated successfully, but these errors were encountered:
初始模型
GPU内存
部分依赖项版本:
我尝试了
更换依赖项版本
结果没有变化
在merge_lora_and_quantize.py文件中加入offload_dir参数
结果产生了如下报错
The text was updated successfully, but these errors were encountered: