Releases · huggingface/optimum-neuron

01 Oct 09:49

dacorvo

v0.0.25

493a134

v0.0.25: SFT Trainer, Llama 3.1-3.2, ControlNet, AWS Neuron SDK 2.20 Latest

Latest

What's Changed

Use AWS Neuron SDK 2.20 (#696) by @dacorvo
Bump optimum to 1.22 (#686) by @JingyaHuang
Bump transformers to 4.43.2 (#665) by @dacorvo

Inference

Add support for multiple ControlNet (#691) by @JingyaHuang
Add ControlNet support for SDXL (#675) by @JingyaHuang

Training

Support SFTTrainer (#682) by @michaelbenayoun
LoRA finetuning tutorial (#671) by @michaelbenayoun

Full Changelog: v0.0.24...v0.0.25

Contributors

dacorvo, michaelbenayoun, and JingyaHuang

Assets 2

12 Aug 11:41

JingyaHuang

v0.0.24

a81a017

v0.0.24: PEFT training support, ControlNet, InstructPix2Pix, Audio models, TGI benchmarks

What's Changed

Use AWS Neuron SDK 2.19.1 by @dacorvo in #661

Training

Initial PEFT support by @michaelbenayoun in #612
PEFT + TP support by @michaelbenayoun in #620
Fix MPMD detected error during training with TP by @michaelbenayoun in #648

Inference

Add Stable Diffusion ControlNet support by @JingyaHuang in #622
Add InstructPix2Pix pipeline support. by @asntr in #625
Add ViT export support and image classification by @JingyaHuang in #616
Add wav2vec2 support - export and audio tasks modeling by @JingyaHuang in #645
Add more audio models: ast, hubert, unispeech, unispeech-sat, wavlm by @JingyaHuang in #651

TGI

Extending TGI benchmarking and documentation by @jimburtoft in #621
Add support for TGI truncate parameter by @dacorvo in #647

Other changes

enable unequal height and width by @yahavb in #592
Skip invalid gen config by @dacorvo in #618
Deprecate resume_download by @Wauplin in #586
Remove a line non-intentionally merged by @JingyaHuang in #628
Add secrets scanning workflow by @mfuntowicz in #631
fix bad link to distributed-training how-to guide in optimum-neuron docs by @aws-amj in #627
Do not copy local checkpoint by @dacorvo in #630
Make neuron_cc_optlevel None by default by @michaelbenayoun in #632
Remove print by @michaelbenayoun in #633
Set bf16 to true when needed by @michaelbenayoun in #635
Fix gradient checkpointing with PEFT by @michaelbenayoun in #634
Refactor decoder tests by @dacorvo in #641
CI cache builder by @dacorvo in #642
Restore optimized attention score for sd15 & fix the generated images quality issue by @JingyaHuang in #646
Add and remove some mark steps by @michaelbenayoun in #644
Fix consolidation for TP by @michaelbenayoun in #649
Fix spelling in error message by @jimburtoft in #656
Update docs by @michaelbenayoun in #588
Fixes NxDPPModel for Neuron SDK 2.19 by @michaelbenayoun in #663
Various fixes for training by @michaelbenayoun in #654
migrate ci by @XciD in #662
ci: fix inference cache pipeline by @dacorvo in #667
broken link by @pagezyhf in #669
Bump TGI version and fix bugs by @dacorvo in #666

New Contributors

@mfuntowicz made their first contribution in #631
@aws-amj made their first contribution in #627
@asntr made their first contribution in #625
@XciD made their first contribution in #662

Full Changelog: v0.0.23...v0.0.24

Contributors

dacorvo, mfuntowicz, and 9 other contributors

Assets 2

31 May 10:09

dacorvo

v0.0.23

e49ee36

v0.0.23: Bump transformers and optimum version

What's Changed

bump required packages versions: transformers==4.41.1, accelerate==0.29.2, optimum==1.20.*

Inference

Fix diffusion caching by @oOraph in #594
Fix inference latency issue when weights/neff are separated by @JingyaHuang in #584
Enable caching for inlined models by @JingyaHuang in #604
Patch attention score far off issue for sd 1.5 by @JingyaHuang in #611

TGI

Fix excessive CPU memory consumption on TGI startup by @dacorvo in #595
Avoid clearing all pending requests on early user cancellations by @dacorvo in #609
Include tokenizer during export and simplify deployment by @dacorvo in #610

Training

Performance improvements and neuron_parallel_compile and gradient checkpointing fixes by @michaelbenayoun in #602

New Contributors

@pagezyhf made their first contribution in #601

Full Changelog: v0.0.22...v0.0.23

Contributors

dacorvo, oOraph, and 3 other contributors

Assets 2

07 May 16:51

JingyaHuang

v0.0.22

88a2805

v0.0.22: Mixtral support, pipeline for sentence transformers, compatibility with Compel

What's Changed

Training

Integrate new API for saving and loading with neuronx_distributed by @michaelbenayoun in #560

Inference

Add support for Mixtral by @dacorvo in #569
Improve Llama models performance by @dacorvo in #587
Make Stable Diffusion pipelines compatible with compel by @JingyaHuang and @neo in #581 (with tests inspired by the snippets sent from @Suprhimp)
Add SentenceTransformers support to pipeline for feature-extration by @philschmid in #583
Allow download subfolder for caching models with subfolder by @JingyaHuang in #566
Do not split decoder checkpoint files by @dacorvo in #567

TGI

Set up TGI environment values with the ones used to build the model by @oOraph in #529
TGI benchmark with llmperf by @dacorvo in #564
Improve tgi env wrapper for neuron by @oOraph in #589

Caveat

Currently traced models with inline_weights_to_neff=False have higher than expected latency during the inference. This is due to the weights are not automatically moved to Neuron devices. The issue will be fixed in #584, please avoid setting inline_weights_to_neff=False in this release.

Other changes

Improve installation guide by @JingyaHuang in #559
upgrade optimum and then install optimum-neuron by @shub-kris in #533
Cleanup obsolete code by @michaelbenayoun in #555
Extend TGI integration tests by @dacorvo in #561
Modify benchmarks by @dacorvo in #563
Bump PyTorch to 2.1 by @JingyaHuang in #502
fix(decoder): specify libraryname to suppress warning by @dacorvo in #570
missing \ in quickstart inference guide by @yahavb in #574
Use AWS 2.18.0 AMI as base by @dacorvo in #572
Update TGI router version to 2.0.1 by @dacorvo in #577
Add guide for LoRA adapters by @JingyaHuang in #582
eos_token_id can be a list in configs by @dacorvo in #580
Ease the tests when there is no hf token by @JingyaHuang in #585
Change inline weights to Neff default value to True by @JingyaHuang in #590

New Contributors

@yahavb made their first contribution in #574

Full Changelog: v0.0.21...v0.0.22

Contributors

dacorvo, neo, and 7 other contributors

Assets 2

09 Apr 08:46

JingyaHuang

v0.0.21

0aafbec

v0.0.21: Expand caching support for inference, GQA training support, TGI improved performance

What's Changed

Training

Add GQA optimization for Tensor Parallel training to support the case tp_size > num_key_value_heads by @michaelbenayoun in #498
Mixed-precision training with both torch_xla or torch.autocast by @michaelbenayoun in #523

Inference

Add caching support for traced TorchScript models (eg. encoders, stable diffusion models) by @JingyaHuang in #510
Support phi model on feature-extraction, text-classification, token-classification tasks by @JingyaHuang in #509

TGI

TGI improvements by @dacorvo in #522

Caveat

AWS Neuron SDK 2.18 doesn't support the compilation of SDXL's unet with weights / neff separation, inline_weights_to_neff=True is forced through:

Disable weights / neff separation of SDXL's UNET for neuron sdk 2.18 by @JingyaHuang in #554

Other changes

Fix/ami authorized keys by @shub-kris in #517
Skip weight load during parallel compile by @michaelbenayoun in #524
fixing format in getting-started.ipynb by @jimburtoft in #526
Removing colab links in notebooks.mdx by @jimburtoft in #525
ADD stale bot by @philschmid in #530
Bump optimum version by @JingyaHuang in #534
Fix style by @JingyaHuang in #538
Fix GQA permutation computation and sequential weight initialization / loading when doing TP by @michaelbenayoun in #531
Add setup runtime step for K8S by @glegendre01 in #541
Disable logging during precompilation by @michaelbenayoun in #539
Do not use deprecated list_files_info by @Wauplin in #536
Adding link to existing Fine-tuning example in Notebooks by @jimburtoft in #527
Add missing notebooks to doc by @JingyaHuang in #543
fix: bug in get_available_cores within container by @oOraph in #546
Init on the xla device by @michaelbenayoun in #521
Adding CodeLlama-7B inference and compilation example notebook by @jimburtoft in #549
Add tools for auto filling traced models cache by @JingyaHuang in #537
Remove print that should not be there by @michaelbenayoun in #552
Use AWS Neuron sdk 2.18 by @dacorvo in #547
Cache utils related cleanup by @michaelbenayoun in #553

New Contributors

@glegendre01 made their first contribution in #541
@Wauplin made their first contribution in #536

Full Changelog: v0.0.20...v0.0.21

Contributors

dacorvo, Wauplin, and 7 other contributors

Assets 2

07 Mar 10:14

dacorvo

v0.0.20

d10aadd

v0.0.20: Multi-node training, SD Lora, sentence transformers clip, TGI improvements

What's Changed

Training

Multi-node training support by @michaelbenayoun (#440)

TGI

optimize continuous batching and improve export (#506)

Inference

Add Lora support to stable diffusion by @JingyaHuang (#483)
Support sentence transformers clip by @JingyaHuang (#495)
Inference compile cache script by @philschmid and @dacorvo (#496, #504)

Doc

Update Inference supported models list by @JingyaHuang (#501)

Bug fixes

inference cache: omit irrelevant config parameters in lookup dy @dacorvo (#494)
Optimize disk usage when fetching model checkpoints by @dacorvo (#505)

Full Changelog: v0.0.19...v0.0.20

Contributors

dacorvo, michaelbenayoun, and 2 other contributors

Assets 2

19 Feb 15:48

JingyaHuang

v0.0.19

e908192

v0.0.19: AWS Neuron SDK 2.17.0, training cache system, TGI improved batching

What's Changed

Training

Integrate new cache system for training by @michaelbenayoun in #472

TGI

Support higher batch sizes using transformers-neuronx continuous batching by @dacorvo in #488
Lift max-concurrent-request limitation usingTGI 1.4.1 by @dacorvo in #488

AMI

Add packer support for building AWS AMI by @shub-kris in #441
[AMI] Updates base ami to new id by @philschmid in #482

Major bugfixes

Fix sdxl inpaint pipeline for diffusers 0.26.* by @JingyaHuang in #458
TGI: update to controller version 1.4.0 & bug fixes by @dacorvo in #470
Fix optimum-cli export for inf1 by @JingyaHuang in #474

Other changes

Add TGI tests and CI workflow by @dacorvo in #355
Bump to optimum 1.17 - Adapt to optimum exporter refactoring by @JingyaHuang in #414
[Training] Support for Transformers 4.37 by @michaelbenayoun in #459
Add contribution guide for Neuron exporter by @JingyaHuang in #461
Fix path, update versions by @shub-kris in #462
Add issue and PR templates & build optimum env cli for Neuron by @JingyaHuang in #463
Fix trigger for actions by @philschmid in #468
TGI: bump rust version by @dacorvo in #477
[documentation] Add Container overview page. by @philschmid in #481
Bump to Neuron sdk 2.17.0 by @JingyaHuang in #487

New Contributors

@shub-kris made their first contribution in #441

Full Changelog: v0.0.18...v0.0.19

Contributors

dacorvo, shub-kris, and 3 other contributors

Assets 2

01 Feb 10:18

dacorvo

v0.0.18

7b18de9

v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training

What's Changed

AWS SDK

Use AWS Neuron SDK 2.16.1 (#449)

Inference

Preliminary support for neff/weights decoupling by @JingyaHuang (#402)
Allow exporting decoder models using optimum-cli by @dacorvo (#422)
Add Neuron X cache registry by @dacorvo (#442)
Add StoppingCriteria to generate() of NeuronModelForCausalLM by @dacorvo (#454)

Training

Initial support for pipeline parallelism by @michaelbenayoun (#279)

TGI

TGI: support vanilla transformer models whose configuration is cached by @dacorvo (#445)

Tutorials and doc improvement

Various fixes by @jimburtoft @michaelbenayoun @JingyaHuang (#428 #429 #432)
Improve Stable Diffusion Notebooks by @JingyaHuang (#431)
Add Sentence Transformers Guide and Notebook by @philschmid (#434)
Add benchmark section by @dacorvo (#435)

Major bugfixes

TGI: correctly identify special tokens during generation by @dacorvo (#438)
TGI: do not include the input_text in generated text by @dacorvo (#454)

Other changes

API change to be compatible to Optimum by @JingyaHuang (#421)

New Contributors

@jimburtoft made their first contribution in #432

Full Changelog: v0.0.17...v0.0.18

Contributors

dacorvo, michaelbenayoun, and 3 other contributors

Assets 2

19 Jan 07:19

dacorvo

v0.0.17

8d4b6dc

v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache

What's Changed

AWS SDK

Use AWS Neuron SDK 2.16 (#398)
Use offical serialization API for transformers_neuronx models instead of beta by @aws-yishanm (#387, #393)

Inference

Improve the support of sentence transformers by @JingyaHuang (#408)
Add Neuronx compile cache Hub proxy and use it for LLM decoder models by @dacorvo (#410)
Add support for Mistral models by @dacorvo (#411)
Do not upload Neuron LLM weights when they can be fetched from the hub by @dacorvo (#413)

Training

Add general support for generation on TRN with NxD by @aws-tianquaw (#370)

Tutorials and doc improvement

Add llama 2 fine tuning tutorial by @philschmid (#390)

Major bugfixes

Skip pushing if the user does not have write access to the cache repo by @michaelbenayoun (#405)

Other changes

Bump Hugging Face library versions by @JingyaHuang (#403)

New Contributors

@aws-tianquaw made their first contribution in #370
@aws-yishanm made their first contribution in #387

Full Changelog: v0.0.16...v0.0.17

Contributors

dacorvo, michaelbenayoun, and 4 other contributors

Assets 2

19 Dec 13:29

michaelbenayoun

v0.0.16

c0c1fc8

v0.0.16: T5 export and inference, general training fixes

What's Changed

Training

A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.

Skip model saving during precompilation and provide option to skip cache push (#365)
Fixes checkpoint saving and consolidtation for TP (#378)
A torch_xla compatible version of safetensors.torch.save_file is now used in the NeuronTrainer (#329)

Inference

Support for the export and inference of T5 (#267)
New documentation for Stable Diffusion XL Turbo (#374)

Assets 2

Releases: huggingface/optimum-neuron

v0.0.25: SFT Trainer, Llama 3.1-3.2, ControlNet, AWS Neuron SDK 2.20

What's Changed

Inference

Training

Contributors

v0.0.24: PEFT training support, ControlNet, InstructPix2Pix, Audio models, TGI benchmarks

What's Changed

Training

Inference

TGI

Other changes

New Contributors

Contributors

v0.0.23: Bump transformers and optimum version

What's Changed

Inference

TGI

Training

New Contributors

Contributors

v0.0.22: Mixtral support, pipeline for sentence transformers, compatibility with Compel

What's Changed

Training

Inference

TGI

Caveat

Other changes

New Contributors

Contributors

v0.0.21: Expand caching support for inference, GQA training support, TGI improved performance

What's Changed

Training

Inference

TGI

Caveat

Other changes

New Contributors

Contributors

v0.0.20: Multi-node training, SD Lora, sentence transformers clip, TGI improvements

What's Changed

Training

TGI

Inference

Doc

Bug fixes

Contributors

v0.0.19: AWS Neuron SDK 2.17.0, training cache system, TGI improved batching

What's Changed

Training

TGI

AMI

Major bugfixes

Other changes

New Contributors

Contributors

v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training

What's Changed

AWS SDK

Inference

Training

TGI

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Contributors

v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache

What's Changed

AWS SDK

Inference

Training

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Contributors

v0.0.16: T5 export and inference, general training fixes

What's Changed

Training