qlora #100

RanchiZhao · 2023-07-28T04:00:32Z

This PR mainly involves the following aspects:

QLoRA overall logic:
- First, quantize the model parameter files.
- Set the int4 field in the model's config to enable QLoRA fine-tuning.
- The rest is consistent with basic task fine-tuning.
Modifications to the model structure:
- Add a bool type field int4 in the model parameter files in the folder src/config, which acts as a switch to control whether to use QLoRA. Corresponding adjustments need to be made in other relevant structures (Attention/SelfAttentionBlock/FFNBlock/TransformerBlock/DenseGatedACT/FeedForward/Encoder/CPMBee) to load the appropriate models based on the int4 field.
- In src/cpm_live/layers/feedforward.py, add class Linear4bit as the QLoRA method linear layer; add class Params4bit as the weight for Linear4bit; add class DistributedParameter4Int8 to meet encapsulation needs.
Add scripts/sample code/README:
- src/quantize_state_dict.py is the code for compressing the initial weights. QLoRA needs to load the compressed dict as model weights.
- src/finetune_cpm_bee_qlora.py is the fine-tuning sample code.
- src/scripts/finetune_cpm_bee_qlora.sh is the fine-tuning sample script.
- tutorials/basic_task_finetune/README_qlora.md is the fine-tuning tutorial for QLoRA.
Other considerations:
- The inspect part of the code has been commented out in src/finetune_cpm_bee_qlora.py, as uint8 does not support std and var.
- It's necessary to synchronize and modify the bug in BMTrain.blocklayer where uint8 type requires_grad cannot be passed in.

qlora

a51dd68

RanchiZhao mentioned this pull request Sep 1, 2023

能否提供一个完整可用的工程 RanchiZhao/bmtrain_qlora#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qlora #100

qlora #100

RanchiZhao commented Jul 28, 2023 •

edited

Loading

qlora #100

Are you sure you want to change the base?

qlora #100

Conversation

RanchiZhao commented Jul 28, 2023 • edited Loading

RanchiZhao commented Jul 28, 2023 •

edited

Loading