v0.0.21: Expand caching support for inference, GQA training support, TGI improved performance
What's Changed
Training
- Add GQA optimization for Tensor Parallel training to support the case
tp_size > num_key_value_heads
by @michaelbenayoun in #498 - Mixed-precision training with both
torch_xla
ortorch.autocast
by @michaelbenayoun in #523
Inference
- Add caching support for traced TorchScript models (eg. encoders, stable diffusion models) by @JingyaHuang in #510
- Support phi model on feature-extraction, text-classification, token-classification tasks by @JingyaHuang in #509
TGI
Caveat
AWS Neuron SDK 2.18 doesn't support the compilation of SDXL's unet with weights / neff separation, inline_weights_to_neff=True
is forced through:
- Disable weights / neff separation of SDXL's UNET for neuron sdk 2.18 by @JingyaHuang in #554
Other changes
- Fix/ami authorized keys by @shub-kris in #517
- Skip weight load during parallel compile by @michaelbenayoun in #524
- fixing format in getting-started.ipynb by @jimburtoft in #526
- Removing colab links in notebooks.mdx by @jimburtoft in #525
- ADD stale bot by @philschmid in #530
- Bump optimum version by @JingyaHuang in #534
- Fix style by @JingyaHuang in #538
- Fix GQA permutation computation and sequential weight initialization / loading when doing TP by @michaelbenayoun in #531
- Add setup runtime step for K8S by @glegendre01 in #541
- Disable logging during precompilation by @michaelbenayoun in #539
- Do not use deprecated list_files_info by @Wauplin in #536
- Adding link to existing Fine-tuning example in Notebooks by @jimburtoft in #527
- Add missing notebooks to doc by @JingyaHuang in #543
- fix: bug in get_available_cores within container by @oOraph in #546
- Init on the
xla
device by @michaelbenayoun in #521 - Adding CodeLlama-7B inference and compilation example notebook by @jimburtoft in #549
- Add tools for auto filling traced models cache by @JingyaHuang in #537
- Remove print that should not be there by @michaelbenayoun in #552
- Use AWS Neuron sdk 2.18 by @dacorvo in #547
- Cache utils related cleanup by @michaelbenayoun in #553
New Contributors
- @glegendre01 made their first contribution in #541
- @Wauplin made their first contribution in #536
Full Changelog: v0.0.20...v0.0.21