requirement.txt problem #5

Elon-Lau · 2024-07-25T00:48:19Z

Hello, Dr. Yang! I encountered the following error using the configuration.txt you gave. What is the cause of this?
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/RiC-main/sft/sft.py", line 83, in
[rank0]: model = AutoModelForCausalLM.from_pretrained(
[rank0]: File "/data/anaconda3/envs/ric/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
[rank0]: return model_class.from_pretrained(
[rank0]: File "/data/anaconda3/envs/ric/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3916, in from_pretrained
[rank0]: ) = cls._load_pretrained_model(
[rank0]: File "/data/anaconda3/envs/ric/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4390, in _load_pretrained_model
[rank0]: new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
[rank0]: File "/data/anaconda3/envs/ric/lib/python3.10/site-packages/transformers/modeling_utils.py", line 945, in _load_state_dict_into_meta_model
[rank0]: value = type(value)(value.data.to("cpu"), **value.dict)
[rank0]: File "/data/anaconda3/envs/ric/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 491, in new
[rank0]: obj = torch.Tensor._make_subclass(cls, data, requires_grad)
[rank0]: RuntimeError: Only Tensors of floating point and complex dtype can require gradients

YangRui2015 · 2024-07-25T15:36:12Z

Hi, I have verified that I can run the configuration file successfully. Could you please provide more details on how you are executing the sft.py file and the package versions of accelerate, bitsandbytes, transformers, and peft?

Based on a related issue (bitsandbytes-foundation/bitsandbytes#1232), it seems that the problem might be due to an outdated version of bitsandbytes. You may need to update it.

Elon-Lau · 2024-07-26T01:41:57Z

Hi, Thank you for your answer! I ran into the same problem when running RiC and SFT, here are my commands.

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch main.py --train_dataset_path './datasets/train_harmhelp.hf' --exp_type 'assistant' --reward_names 'harmless,helpful' --training_steps 20000 --num_online_iterations 0 --wandb_name 'ric_assistant_harmlesshelpful_offline20000' --batch_size 2 --load_in_8bit True

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch sft.py --base_model_name 'meta-llama/Llama-2-7b-hf' --exp_type 'summary'

The version of accelerate, bitsandbytes ,transformers, peft, trl, torch, CUDA are 0.32.1, 0.43.2, 4.40.0, 0.11.1, 0.9.4, 2.3.1, 12.0, respectively. In addition, I'm confused about --wandb_name {name_of_the_experiment}. Is it in the format helpful_assistant and reddit_summary?

YangRui2015 · 2024-07-27T12:22:17Z

I cannot reproduce your issue with the configuration.

transformers             4.40.0
trl                      0.9.4
peft                     0.11.1
accelerate               0.32.1
bitsandbytes             0.43.1
deepspeed                0.14.4
torch                    2.3.1
CUDA                  cuda_12.1

I can run this code successfully. Please first check whether you can run the sft example from trl https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

requirement.txt problem #5

requirement.txt problem #5

Elon-Lau commented Jul 25, 2024

YangRui2015 commented Jul 25, 2024

Elon-Lau commented Jul 26, 2024 •

edited

Loading

YangRui2015 commented Jul 27, 2024

requirement.txt problem #5

requirement.txt problem #5

Comments

Elon-Lau commented Jul 25, 2024

YangRui2015 commented Jul 25, 2024

Elon-Lau commented Jul 26, 2024 • edited Loading

YangRui2015 commented Jul 27, 2024

Elon-Lau commented Jul 26, 2024 •

edited

Loading