v0.0.23: Bump transformers and optimum version
What's Changed
- bump required packages versions:
transformers==4.41.1
,accelerate==0.29.2
,optimum==1.20.*
Inference
- Fix diffusion caching by @oOraph in #594
- Fix inference latency issue when weights/neff are separated by @JingyaHuang in #584
- Enable caching for inlined models by @JingyaHuang in #604
- Patch attention score far off issue for sd 1.5 by @JingyaHuang in #611
TGI
- Fix excessive CPU memory consumption on TGI startup by @dacorvo in #595
- Avoid clearing all pending requests on early user cancellations by @dacorvo in #609
- Include tokenizer during export and simplify deployment by @dacorvo in #610
Training
- Performance improvements and neuron_parallel_compile and gradient checkpointing fixes by @michaelbenayoun in #602
New Contributors
Full Changelog: v0.0.22...v0.0.23