模型微调出现模型部分参数在cpu上面 #21

kunzeng-ch · 2023-07-10T08:24:38Z

大佬，这是怎么回事，我是直接执行了train_qlora.py文件，然后出现了这个错误
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 588, in forward
hidden_states, kv_cache = layer(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jovyan/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 510, in forward
attention_output, kv_cache = self.self_attention(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jovyan/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 342, in forward
mixed_x_layer = self.query_key_value(hidden_states)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/peft/tuners/lora.py", line 456, in forward
after_A = self.lora_A(self.lora_dropout(x))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

shuxueslpi · 2023-07-10T08:33:22Z

多卡目前有点问题，可以先在一张卡上跑。

…

---- 回复的原邮件 ---- | 发件人 | ***@***.***> | | 日期 | 2023年07月10日 16:24 | | 收件人 | ***@***.***> | | 抄送至 | ***@***.***> | | 主题 | [shuxueslpi/chatGLM-6B-QLoRA] 模型微调出现模型部分参数在cpu上面 (Issue #21) | 大佬，这是怎么回事，我是直接执行了train_qlora.py文件，然后出现了这个错误 File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 588, in forward hidden_states, kv_cache = layer( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/jovyan/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 510, in forward attention_output, kv_cache = self.self_attention( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/jovyan/.local/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/jovyan/.cache/huggingface/modules/transformers_modules/chatglm2_6b/modeling_chatglm.py", line 342, in forward mixed_x_layer = self.query_key_value(hidden_states) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/peft/tuners/lora.py", line 456, in forward after_A = self.lora_A(self.lora_dropout(x)) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

kunzeng-ch · 2023-07-10T09:22:19Z

好的大佬，密切关注大佬动向

shuxueslpi · 2023-07-10T09:40:53Z

@kunzeng-ch 不好意思，我刚刚看错了，我以为你是两张卡，报的cuda0和cuda1，但是你报的是cuda0和cpu，能告知下你的硬件环境吗？显卡型号，cpu型号，操作系统等？
之前好像没有人遇到过这个问题。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

模型微调出现模型部分参数在cpu上面 #21

模型微调出现模型部分参数在cpu上面 #21

kunzeng-ch commented Jul 10, 2023

shuxueslpi commented Jul 10, 2023 via email

kunzeng-ch commented Jul 10, 2023

shuxueslpi commented Jul 10, 2023

模型微调出现模型部分参数在cpu上面 #21

模型微调出现模型部分参数在cpu上面 #21

Comments

kunzeng-ch commented Jul 10, 2023

shuxueslpi commented Jul 10, 2023 via email

kunzeng-ch commented Jul 10, 2023

shuxueslpi commented Jul 10, 2023