v0.0.23: Bump transformers and optimum version

dacorvo released this 31 May 10:09

What's Changed

bump required packages versions: transformers==4.41.1, accelerate==0.29.2, optimum==1.20.*

Inference

Fix diffusion caching by @oOraph in #594
Fix inference latency issue when weights/neff are separated by @JingyaHuang in #584
Enable caching for inlined models by @JingyaHuang in #604
Patch attention score far off issue for sd 1.5 by @JingyaHuang in #611

TGI

Fix excessive CPU memory consumption on TGI startup by @dacorvo in #595
Avoid clearing all pending requests on early user cancellations by @dacorvo in #609
Include tokenizer during export and simplify deployment by @dacorvo in #610

Training

Performance improvements and neuron_parallel_compile and gradient checkpointing fixes by @michaelbenayoun in #602

New Contributors

@pagezyhf made their first contribution in #601

Full Changelog: v0.0.22...v0.0.23

Contributors

dacorvo, oOraph, and 3 other contributors

Assets 2