Releases: foundation-model-stack/fms-hf-tuning
Releases · foundation-model-stack/fms-hf-tuning
v0.4.0-rc.2
Summary of Changes
- Support for LoRA tuning for llama3 and granite (with GPTBigCode) architectures
- Various dependencies versions adjustment
What's Changed
- remove merge model for lora tuned adapters by @anhuong in #197
- Add test coverage by @tedhtchang in #171
- Install Acceleration Framework into Training Script by @fabianlim in #157
- deps: limit dependency ranges by @anhuong in #54
- Delete dependabot.yml by @tedhtchang in #207
- add dependabot.yml by @tedhtchang in #208
- Fix additional callbacks by @VassilisVassiliadis in #199
- Update trl by @alex-jw-brooks in #213
- deps: cap transformers at 4.40.2 by @anhuong in #218
Full Changelog: v0.3.0...v0.4.0-rc.2
v0.4.0-rc.1
What's Changed
- remove merge model for lora tuned adapters by @anhuong in #197
- Add test coverage by @tedhtchang in #171
- Install Acceleration Framework into Training Script by @fabianlim in #157
- deps: limit dependency ranges by @anhuong in #54
- Delete dependabot.yml by @tedhtchang in #207
- add dependabot.yml by @tedhtchang in #208
- Fix additional callbacks by @VassilisVassiliadis in #199
- Update trl by @alex-jw-brooks in #213
Full Changelog: v0.3.0...v0.4.0-rc.1
v0.3.0
Summary of Changes
- Switch to multistage dockerfile which greatly reduced the size of the image
- Refactor image scripts to remove
launch_training
and callsft_trainer
directly.- Note that this affects the error codes returned from
sft_trainer
to user error code 1 and internal error code 203. - In addition, this affects the logging as parameter parsing logging is moved into
sft_trainer
which is harder to view.
- Note that this affects the error codes returned from
What's Changed
- Switch to multistage dockerfile by @tharapalanivel in #154
- refactor: remove launch_training and call sft_trainer directly by @anhuong in #164
- docs: consolidate configs, add kfto config by @anhuong in #170
- fix: bloom model can't run with flash-attn by @anhuong in #173
- Update README.md for Lora modules by @Ssukriti in #174
Full Changelog: v0.2.0...v0.3.0
v0.2.0
Summary of Changes
- Adds a new
data_formatter_template
field to format data while training from a JSON with custom fields. Eliminating the need to do preprocessing and format data to alpaca style. Find details in README - Update
evaluation_strategy
flag toeval_strategy
- Add evaluation data format scripts to use as reference
What's Changed
- fix: check if output dir exists by @anhuong in #160
- tests for fixing full fine tuning by @Ssukriti in #162
- Evaluation Data Format Scripts by @alex-jw-brooks in #115
- Refactor tests explicit params by @Ssukriti in #163
- update eval_strategy flag used in transformers by @anhuong in #168
- remove unused python39 from dockerfile by @jbusche in #167
- Add formatting function alpaca by @Ssukriti in #161
Pip package: pip install fms-hf-tuning==0.2.0
Full Changelog: v0.1.0...v0.2.0
v0.2.0-rc.1
Add formatting function alpaca (#161) * utility functions to format datasets using template Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add tests and formatter as arg Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update tests to use template to avoid warnings Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update README and tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix:formatter Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * Update README.md Signed-off-by: Sukriti Sharma <Ssukriti@users.noreply.github.com> * fix imports Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix pylint Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * address review comments- function names Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * formatting fix Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update error message Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * restrict JSON fields templates Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> --------- Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> Signed-off-by: Sukriti Sharma <Ssukriti@users.noreply.github.com>
v0.1.0 - First release
Summary of Changes
- Supports and validated tuning technique: full fine tuning using single-GPU and multi-GPU
- Multi-GPU training using HuggingFace accelerate library, focused on FSDP
- Experimental tuning techniques:
- Single GPU Prompt tuning
- Single GPU LoRA tuning
- Scripts to allow local inference and evaluation of tuned models
- Build scripts for containerization of library
- Initial trainer controller framework for controlling the trainer loop using user-defined rules and metrics
Pip package: pip install fms-hf-tuning==0.1.0
What's Changed
- Init by @raghukiran1224 in #1
- allows disable flash attn and torch dtype param by @Ssukriti in #2
- First refactor train by @Ssukriti in #3
- fix : the way args are passed by @Ssukriti in #10
- fix full param tuning by @lchu-ibm in #14
- fix import of aim_loader by @anhuong in #13
- fix: set model max length to either passed in or tokenizer value by @anhuong in #17
- fix: do not set model max length when loading model by @anhuong in #21
- add EOS token to dataset by @Ssukriti in #15
- Local inference by @alex-jw-brooks in #27
- feat: add validation dataset to train by @anhuong in #26
- feat: support str in target_modules for LoraConfig by @VassilisVassiliadis in #39
- Add formatting tools by @hickeyma in #31
- Enable code formatting by @hickeyma in #40
- Enable daily dependabot updates by @hickeyma in #41
- Add file logger callback & export train loss json file by @alex-jw-brooks in #22
- Merge models by @alex-jw-brooks in #32
- Local inference merged models by @alex-jw-brooks in #43
- feat: track validation loss in logs file by @anhuong in #51
- Add linting capability by @hickeyma in #52
- Add PR/Issue templates by @tedhtchang in #65
- Add sample unit tests by @tedhtchang in #61
- Initial commit for trainer image by @tharapalanivel in #69
- Adding copyright notices by @tharapalanivel in #77
- Enable pylint in the github workflow by @tedhtchang in #63
- Bump aim from 3.17.5 to 3.18.1 by @dependabot in #42
- Add Contributing file by @jbusche in #58
- docs: lora and getting modules list by @anhuong in #46
- Allow SFT_TRAINER_CONFIG_JSON_ENV_VAR to be encoded json string by @kellyaa in #82
- Document lint by @tedhtchang in #84
- Let Huggingface Properly Initialize Arguments, and Fix FSDP-LORA Checkpoint-Saves and Resumption by @fabianlim in #53
- Unit tests by @tharapalanivel in #83
- Update CONTRIBUTING.md by @Ssukriti in #86
- Update input args to max_seq_length and training_data_path by @anhuong in #94
- feat: move to accelerate launch for distributed training by @kmehant in #92
- Update README.md by @Ssukriti in #95
- Modify copyright notice by @tharapalanivel in #96
- Switches dependencies from txt file to toml file by @jbusche in #68
- fix: use attn_implementation="flash_attention_2" by @kmehant in #101
- fix: not passing PEFT argument should default to full parameter finetuning by @kmehant in #100
- feat: update launch training with accelerate for multi-gpu by @anhuong in #98
- Setting default values in training job config by @tharapalanivel in #104
- add refactored build utils into docker image by @anhuong in #108
- feat: combine train and eval loss into one file by @anhuong in #109
- docs: add note on ephemeral storage by @anhuong in #106
- Move accelerate launch args parsing by @tharapalanivel in #107
- Docs improvements by @Ssukriti in #111
- feat: add env var SET_NUM_PROCESSES_TO_NUM_GPUS by @anhuong in #110
- feat: Trainer controller framework by @seshapad in #45
- Copying logs file by @tharapalanivel in #113
- Fix copying over logs by @tharapalanivel in #114
- Add eval script by @alex-jw-brooks in #102
- Lint tests by @tharapalanivel in #112
- Move sklearn to optional, install optionals for linting by @alex-jw-brooks in #117
- Build Wheel Action by @jbusche in #105
- rstrip eos in evaluation by @alex-jw-brooks in #121
- Fix eos token suffix removal by @alex-jw-brooks in #125
- Make use of instruction field optional by @alex-jw-brooks in #123
- Deprecating the requirements.txt for dependencies management by @tedhtchang in #116
- Add unit tests for various edge cases by @alex-jw-brooks in #97
- fix typo in build gha by @jbusche in #138
- Install whl in Dockerfile by @tedhtchang in #126
- feat: add flash attn to inference and eval scripts by @anhuong in #132
- OS update in dockerfile by @jbusche in #127
- fix: ignore the build output and auto-generated files by @HarikrishnanBalagopal in #140
- Propose ADR for Training Acceleration by @fabianlim in #119
- feat: new format for the controller metrics and operations by @HarikrishnanBalagopal in #130
- adr: Format change to the trainer controller configuration by @seshapad in #128
- Generic tracker API and implementation of Aimstack tracker by @dushyantbehl in #89
- fix: Allow makefile to run test independent of fmt/lint by @dushyantbehl in #145
- feat: Trainer state as a trainer controller metric by @seshapad in #150
- Bump aim from 3.18.1 to 3.19.0 by @dependabot in #93
- fix: launch_training.py arguments with new tracker api by @dushyantbehl in #153
- feat: Exposed the evaluation metrics for rules within trainer controller by @seshapad in #146
- Comment out aim in dockerfile by @jbusche in #155
- fix: replace eval with a safer alternative by @HarikrishnanBalagopal in #147
- doc...
v0.1.0-rc.1
What's Changed
- fix: replace eval with a safer alternative by @HarikrishnanBalagopal in #147
- docs: ADR for moving from
eval
tosimpleeval
for evaluating trainer controller rules by @HarikrishnanBalagopal in #151 - Add exception catching / writing to termination log by @kellyaa in #149
- fix: merging of model for multi-gpu by @anhuong in #158
- add .complete file to output dir when done by @kellyaa in #159
Full Changelog: v0.0.2rc2...v0.1.0-rc.1
v0.0.2rc.2
What's Changed
- fix typo in build gha by @jbusche in #138
- Install whl in Dockerfile by @tedhtchang in #126
- feat: add flash attn to inference and eval scripts by @anhuong in #132
- OS update in dockerfile by @jbusche in #127
- fix: ignore the build output and auto-generated files by @HarikrishnanBalagopal in #140
- Propose ADR for Training Acceleration by @fabianlim in #119
- feat: new format for the controller metrics and operations by @HarikrishnanBalagopal in #130
- adr: Format change to the trainer controller configuration by @seshapad in #128
- Generic tracker API and implementation of Aimstack tracker by @dushyantbehl in #89
- fix: Allow makefile to run test independent of fmt/lint by @dushyantbehl in #145
- feat: Trainer state as a trainer controller metric by @seshapad in #150
- Bump aim from 3.18.1 to 3.19.0 by @dependabot in #93
- fix: launch_training.py arguments with new tracker api by @dushyantbehl in #153
- feat: Exposed the evaluation metrics for rules within trainer controller by @seshapad in #146
- Comment out aim in dockerfile by @jbusche in #155
New Contributors
- @HarikrishnanBalagopal made their first contribution in #140
- @dushyantbehl made their first contribution in #89
Full Changelog: v0.0.2rc1...v0.0.2rc2
v0.0.2rc1
What's Changed
- Init by @raghukiran1224 in #1
- allows disable flash attn and torch dtype param by @Ssukriti in #2
- First refactor train by @Ssukriti in #3
- fix : the way args are passed by @Ssukriti in #10
- fix full param tuning by @lchu-ibm in #14
- fix import of aim_loader by @anhuong in #13
- fix: set model max length to either passed in or tokenizer value by @anhuong in #17
- fix: do not set model max length when loading model by @anhuong in #21
- add EOS token to dataset by @Ssukriti in #15
- Local inference by @alex-jw-brooks in #27
- feat: add validation dataset to train by @anhuong in #26
- feat: support str in target_modules for LoraConfig by @VassilisVassiliadis in #39
- Add formatting tools by @hickeyma in #31
- Enable code formatting by @hickeyma in #40
- Enable daily dependabot updates by @hickeyma in #41
- Add file logger callback & export train loss json file by @alex-jw-brooks in #22
- Merge models by @alex-jw-brooks in #32
- Local inference merged models by @alex-jw-brooks in #43
- feat: track validation loss in logs file by @anhuong in #51
- Add linting capability by @hickeyma in #52
- Add PR/Issue templates by @tedhtchang in #65
- Add sample unit tests by @tedhtchang in #61
- Initial commit for trainer image by @tharapalanivel in #69
- Adding copyright notices by @tharapalanivel in #77
- Enable pylint in the github workflow by @tedhtchang in #63
- Bump aim from 3.17.5 to 3.18.1 by @dependabot in #42
- Add Contributing file by @jbusche in #58
- docs: lora and getting modules list by @anhuong in #46
- Allow SFT_TRAINER_CONFIG_JSON_ENV_VAR to be encoded json string by @kellyaa in #82
- Document lint by @tedhtchang in #84
- Let Huggingface Properly Initialize Arguments, and Fix FSDP-LORA Checkpoint-Saves and Resumption by @fabianlim in #53
- Unit tests by @tharapalanivel in #83
- Update CONTRIBUTING.md by @Ssukriti in #86
- Update input args to max_seq_length and training_data_path by @anhuong in #94
- feat: move to accelerate launch for distributed training by @kmehant in #92
- Update README.md by @Ssukriti in #95
- Modify copyright notice by @tharapalanivel in #96
- Switches dependencies from txt file to toml file by @jbusche in #68
- fix: use attn_implementation="flash_attention_2" by @kmehant in #101
- fix: not passing PEFT argument should default to full parameter finetuning by @kmehant in #100
- feat: update launch training with accelerate for multi-gpu by @anhuong in #98
- Setting default values in training job config by @tharapalanivel in #104
- add refactored build utils into docker image by @anhuong in #108
- feat: combine train and eval loss into one file by @anhuong in #109
- docs: add note on ephemeral storage by @anhuong in #106
- Move accelerate launch args parsing by @tharapalanivel in #107
- Docs improvements by @Ssukriti in #111
- feat: add env var SET_NUM_PROCESSES_TO_NUM_GPUS by @anhuong in #110
- feat: Trainer controller framework by @seshapad in #45
- Copying logs file by @tharapalanivel in #113
- Fix copying over logs by @tharapalanivel in #114
- Add eval script by @alex-jw-brooks in #102
- Lint tests by @tharapalanivel in #112
- Move sklearn to optional, install optionals for linting by @alex-jw-brooks in #117
- Build Wheel Action by @jbusche in #105
- rstrip eos in evaluation by @alex-jw-brooks in #121
- Fix eos token suffix removal by @alex-jw-brooks in #125
- Make use of instruction field optional by @alex-jw-brooks in #123
- Deprecating the requirements.txt for dependencies management by @tedhtchang in #116
- Add unit tests for various edge cases by @alex-jw-brooks in #97
New Contributors
- @raghukiran1224 made their first contribution in #1
- @Ssukriti made their first contribution in #2
- @lchu-ibm made their first contribution in #14
- @anhuong made their first contribution in #13
- @alex-jw-brooks made their first contribution in #27
- @VassilisVassiliadis made their first contribution in #39
- @hickeyma made their first contribution in #31
- @tedhtchang made their first contribution in #65
- @tharapalanivel made their first contribution in #69
- @dependabot made their first contribution in #42
- @jbusche made their first contribution in #58
- @kellyaa made their first contribution in #82
- @fabianlim made their first contribution in #53
- @kmehant made their first contribution in #92
- @seshapad made their first contribution in #45
Full Changelog: https://github.com/foundation-model-stack/fms-hf-tuning/commits/v.0.0.2rc1