v2.0.1
New major features:
- Support for LoRA for the following model architectures - llama3, llama3.1, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral, and allam
- Support for QLora for the following model architectures - llama3, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral
- Addition of post-processing function to format tuned adapters as required by vLLM for inference. Refer to README on how to run as a script. When tuning on image, post-processing can be enabled using the flag
lora_post_process_for_vllm
. See build README for details on how to set this flag. - Enablement of new flags for throughput improvements:
padding_free
to process multiple examples without adding padding tokens,multipack
for multi-GPU training to balance the number of tokens processed on each device, andfast_kernels
for optimized tuning with fused operations and triton kernels. See README for details on how to set these flags and use cases.
Dependency upgrades:
- Upgraded
transformers
to version 4.44.2 needed for tuning of all models - Upgraded
accelerate
to version 0.33 needed for tuning of all models. Version 0.34.0 has a bug for FSDP.
API /interface changes:
train()
API now returns a tuple of trainer instance and additional metadata as a dict
Additional features and fixes
- Support of resume tuning from the existing checkpoint. Refer to README on how to use it as a flag. Flag
resume_training
defaults toTrue
. - Addition of default pad token in tokenizer when
EOS
andPAD
tokens are equal to improve training quality. - JSON compatability for input datasets. See docs for details on data formats.
- Fix to not resize embedding layer by default, embedding layer can continue to be resized as needed using flag
embedding_size_multiple_of
.
Full List of what's Changed
- fix: do not resize embedding layer by default by @kmehant in #310
- fix: logger is unbound error by @HarikrishnanBalagopal in #308
- feat: Enable JSON dataset compatibility by @willmj in #297
- doc: How to tune LoRA lm_head by @aluu317 in #305
- docs: Add findings from exploration into model tuning performance degradation by @willmj in #315
- fix: warnings about casing when building the Docker image by @HarikrishnanBalagopal in #318
- fix: need to pass skip_prepare_dataset for pretokenized dataset due to breaking change in HF SFTTrainer by @HarikrishnanBalagopal in #326
- feat: install fms-acceleration to enable qlora by @anhuong in #284
- feat: Migrating the trainer controller to python logger by @seshapad in #309
- fix: remove fire ported from Hari's PR #303 by @HarikrishnanBalagopal in #324
- dep: cap transformers version due to FSDP bug by @anhuong in #335
- deps: Add protobuf to support aLLaM models by @willmj in #336
- fix: add enable_aim build args in all stages needed by @anhuong in #337
- fix: remove lm_head post processing by @Abhishek-TAMU in #333
- doc: Add qLoRA README by @aluu317 in #322
- feat: Add deps to evaluate qLora tuned model by @aluu317 in #312
- feat: Add support for smoothly resuming training from a saved checkpoint by @Abhishek-TAMU in #300
- ci: add a github workflow to label pull requests based on their title by @HarikrishnanBalagopal in #298
- fix: Addition of default pad token in tokenizer when EOS and PAD token are equal by @Abhishek-TAMU in #343
- feat: Add DataClass Arguments to Activate Padding-Free and MultiPack Plugin and FastKernels by @achew010 in #280
- fix: cap transformers at v4.44 by @anhuong in #349
- fix: utilities to post process checkpoint for LoRA by @Ssukriti in #338
- feat: Add post processing logic to accelerate launch by @willmj in #351
- build: install additional fms-acceleration plugins by @anhuong in #350
- fix: unable to find output_dir in multi-GPU during resume_from_checkpoint check by @Abhishek-TAMU in #352
- fix: check for wte.weight along with embed_tokens.weight by @willmj in #356
- release: merge set of changes for v2.0.0 by @Abhishek-TAMU in #357
- build(deps): unset hardcoded trl version to get latest updates by @anhuong in #358
New Contributors
Full Changelog: v1.2.2...v2.0.0