Skip to content

Releases: huggingface/optimum-habana

v1.14.1: Patch release

29 Oct 17:13
Compare
Choose a tag to compare

Full Changelog: v1.14.0...v1.14.1

v1.14.0: Transformers v4.45, SynapseAI v1.18, Qwen2-MoE, text-to-video generation

22 Oct 16:11
Compare
Choose a tag to compare

Transformers v4.45

SynapseAI v1.18

Qwen2-MoE

  • Added Qwen2-MoE model, optimizing its performance on Gaudi #1316 @gyou2021

Text-to-video generation

Depth-to-image generation

Model optimizations

Intel Neural Compressor

  • Enable INC for llava models and change softmax to use torch.nn.functional.softmax as its supported module by INC #1325 @tthakkal
  • Load INC GPTQ checkpoint & rename params #1364 @HolyFalafel
  • Fix load INC load weights compile error due to Transformer 4.45 upgrade. #1421 @jiminha

Vera/LN-tuning

Other

v1.13.2: Patch release

06 Sep 20:17
Compare
Choose a tag to compare

Llava(-next) improvements

This patch release adds multi-card support for Llava(-next) and enables users to turn on/off recomputing for flash attention.

  • Llava: Added flash_attention_recompute arg to provide an option to enable/disable recompute #1278 @tthakkal
  • Add the deepspeed injection_policy of mistral #1309 @yuanwu2017

Full Changelog: v1.13.1...v1.13.2

v1.13.1: Patch release

25 Aug 13:34
Compare
Choose a tag to compare

Fixed memory regressions

  • Remove _expand_inputs_for_generation for greedy search (#1266) @libinta
  • Fix memory regression for modeling llama (#1271) @libinta

FSDP

FSDP checkpoint saving is fixed.

Known limitations

  • ESMFold does not work on Gaudi1, this will be fixed in a future version

Full Changelog: v1.13.0...v1.13.1

v1.13.0: Stable Diffusion 3, Sentence Transformers, SAM, DETR, Kubernetes example

16 Aug 14:25
Compare
Choose a tag to compare

SynapseAI 1.17

  • Upgrade SynapseAI version to 1.17.0 #1217

Transformers 4.43

Diffusers 0.29

  • Upgrade optimum-habana diffusers dependency from 0.26.3 to 0.29.2 #1150 @dsocek

Stable Diffusion 3

Training with Sentence Transformers

Model optimizations

SAM, FastVIT, VideoMAE, OpenCLIP, DETR, Table Transformer, deciLM

Stable Diffusion inpainting, unconditional image generation

  • Add the Stable diffusion inpaint support #869 @yuanwu2017
  • Enable Unconditional Image Generation on Gaudi 2 [Diffuser/Tasks] #859 @cfgfung

Text feature extraction example

Tensor parallelism

  • Tensor parallel distributed strategy without using deepspeed #1121 @kalyanjk
  • Disable torch.compile for all_reduce when parallel_strategy is set to "tp" #1174 @kalyanjk

Kubernetes cluster example

  • Adds a helm chart, dockerfile, and instructions for running examples using a Kubernetes cluster #1099 @dmsuehir
  • Fix PyTorch version in the Kubernetes docker-compose to match image #1246 @dmsuehir

FP8 training

Other

Known limitations

  • For Llama, some big batch sizes lead to out-of-memory errors whereas they used to work

v1.12.1: Patch Release

11 Jul 13:51
Compare
Choose a tag to compare

Fix 1st token latency time measure

Fix for Mixtral

Other

  • Fix for selective seq length test with batch size 1 #1110 @libinta

Full Changelog: v1.12.0...v1.12.1

v1.12: Qwen2, Gemma, SVD, Dreambooth, speculative sampling

22 Jun 18:28
Compare
Choose a tag to compare

SynapseAI v1.16

Transformers 4.40

Speculative Sampling

Model optimizations

Stable Video Diffusion

PEFT

TRL

Object Segmentation Example

  • Add an example of object segmentation (ClipSeg) #801 @cfgfung

Dreambooth

  • Diffuser dreambooth full/lora/lokr/loha/oft finetune, dreambooth XL lora finetune #881 @sywangyi

Others

v1.11.1: Patch Release

20 Apr 05:28
Compare
Choose a tag to compare

Llama3 has been validated on Gaudi

Fix issue with pytest

The latest SynapseAI Docker images come with Pytest v8 already installed, which is incompatible with the Transformers library and leads to an error in a few non-test cases. As a temporary workaround, Pytest is pinned and moved as a hard dependency.

Other

Full Changelog: v1.11.0...v1.11.1

v1.11: SDXL fine-tuning, Whisper, Phi, ControlNet

04 Apr 14:55
Compare
Choose a tag to compare

SynapseAI v1.15

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.15.0.

SDXL fine-tuning

Whisper

Phi

ControlNet

Transformers v4.38

The codebase is fully validated for Transformers v4.38.

Model optimizations

Image-to-text and VQA examples

  • Add image-to-text and visual question answering example #738 @sywangyi

torch.compile

Bug fixes

Others

Known issue

v1.10.4: Patch release

23 Feb 03:26
Compare
Choose a tag to compare

Fix Llama memory issue with DeepSpeed ZeRO-3

  • Fix Llama initialization #712

Full Changelog: v1.10.2...v1.10.4