Releases · huggingface/optimum-habana

29 Oct 17:13

regisss

v1.14.1

84b6455

v1.14.1: Patch release Latest

Latest

Enable DeepSpeed for image-to-text example #1455 @schoi-habana
Fix bug when loading 4bit checkpoint quantized in INC #1447 @xin3he
Fixes 'Tokenizer does not have padding token' introduced by #1444 for Llama3.1 #1457 @MohitIntel

Full Changelog: v1.14.0...v1.14.1

Contributors

schoi-habana, MohitIntel, and xin3he

Assets 2

22 Oct 16:11

regisss

v1.14.0

058e91c

v1.14.0: Transformers v4.45, SynapseAI v1.18, Qwen2-MoE, text-to-video generation

Transformers v4.45

Upgrade to Transformers v4.45 #1359 @regisss

SynapseAI v1.18

Upgrade to SynapseAI 1.18.0 #1418 @regisss

Qwen2-MoE

Added Qwen2-MoE model, optimizing its performance on Gaudi #1316 @gyou2021

Text-to-video generation

Enabling Text to Video Diffusion Model Generation #1109 @pi314ever
Porting Stable Video Diffusion ControNet to HPU #1037 @wenbinc-Bin

Depth-to-image generation

Depth to Image Generation #1175 @pi314ever

Model optimizations

Enable FusedSDPA for Mpt #1101 @Jianhong-Zhang
Mixtral fp8 #1269 @imangohari1
Prevent Graph break in Llama when using flash attention #1301 @pramodkumar-habanalabs
Boost SDXL speed with initialized schedule step reset #1284 @dsocek
Improve MPT fp8 #1256 @atakaha
Add Whisper static generation #1275 @Spycsh
Gemma: enabled HPU Graphs and Flash Attention #1173 @dsmertin
Recommend jemalloc for gpt-neox-20b 8x #1350 @hsubramony
Optimized inference of GPT-NEO model on HPU #1319 @XinyuYe-Intel
Fix graph breaks for BART in torch.compile mode. #1379 @astachowiczhabana
Gpt_bigcode: added internal_bucketing support #1218 @mgonchar
refine bucket_internal for mpt #1194 @Jing1Ling
Qwen finetuning bucketing #1130 @ssarkar2
Enable FusedSDPA fp8 in Llama FT #1388 @pbielak
Added gemma specific fp8 quantization file #1445 @yeonsily

Intel Neural Compressor

Enable INC for llava models and change softmax to use torch.nn.functional.softmax as its supported module by INC #1325 @tthakkal
Load INC GPTQ checkpoint & rename params #1364 @HolyFalafel
Fix load INC load weights compile error due to Transformer 4.45 upgrade. #1421 @jiminha

Vera/LN-tuning

Vera/ln_tuning add and test case add #1294 @sywangyi

Other

Add callable workflow to post comments when code quality check failed #1263 @regisss
Fix failed code quality check comment workflow #1264 @regisss
Accelerate Diffusers CI #1265 @regisss
Add profiler to SD3 #1267 @atakaha
Fix profiling step with device finish execution for text-generation #1283 @libinta
Update FusedSDPA calling method as Gaudi documentation #1285 @yeonsily
Switch failed code quality check comment to workflow_run #1297 @regisss
Potential fix for the failed code quality check comment workflow #1299 @regisss
Fix text-generation example lm_eval evaluation #1308 @changwangss
Add section to README about Transformers development branch #1307 @regisss
Fix eager mode in run_generation by removing graph logs #1231 @Vasud-ha
Fix bug when running google/paligemma-3b-mix-224 #1279 @kaixuanliu
Use native checkpointing under compile mode #1313 @xinyu-intel
fixed fused_qkv object AttributeError due to 'LlamaConfig' #1203 @rkumar2patel
Image to Image Generation Enabling #1196 @pi314ever
Diffusers timing #1277 @imangohari1
Fix eos issue in finetune/generation #1253 @sywangyi
Update CI, tests and examples #1315 @regisss
Fix Sentence Transformer HPU graphs for training with PEFT model #1320 @nngokhale
Fix ZeroDivisionError in constrained beam search with static shapes #1317 @skavulya
Update esmfold model not to use param_buffer_assignment #1324 @jiminha
Falcon inference crash fix for falcon-40b model #1161 @yeonsily
Add --use_kv_cache to image-to-text pipeline #1292 @KimBioInfoStudio
Trl upgrade #1245 @sywangyi
Fix uint4 url typo. #1340 @kding1
Use eager attention for wav2vec2 #1333 @skaulintel
Add _reorder_cache back to Llama for HPU #1233 @jiminha
SDXL CI script throughput #1296 @imangohari1
Add image so that transformers tests can run #1338 @skaulintel
Fixes the no attribute error with the falcon multicard test #1344 @mounikamandava
Add profiler to sdxl mlperf pipeline #1339 @Jianhong-Zhang
Fix decoder only generation #948 @tjs-intel
Upgrade gradient chekpointing #1347 @yafshar
Run_generation example: fixed graph compilation statistics reporting #1352 @mgonchar
Fix deepseeed crash with Sentence Transformer Trainer #1328 @nngokhale
fea(ci): reduced slow test_diffusers timing. minor fixes #1330 @imangohari1
Flash attn args for GaudiGemmaForCausalLM #1356 @kkoryun
Transformer models generation supports user-provided input embeddings #1276 @zongwave
Fixed the expected values after for img2img slice #1332 @imangohari1
Gpt_big_code: make flash attention impl quantization friendly #1282 @mgonchar
Fix OOM when inference with llama-3.1-70b #1302 @harborn
Fix the conditional #1362 @yafshar
Revert "use native checkpointing under compile mode" #1365 @xinyu-intel
Remove repetitive pip install commands #1367 @MohitIntel
Minor UX enhancement #1373 @MohitIntel
Fix bug when running image-to-text example #1371 @kaixuanliu
Gpt_bigcode: fixed wrong indentation #1376 @mgonchar
Support for transformers without self.model to torch.compile #1380 @astachowiczhabana
Only pass the use_kv_cache True to generator #1366 @yafshar
Clean up the code and remove unnecessary class #1382 @yafshar
Add the diffusers examples of inference Tech #1244 @yuanwu2017
Enhance transformers test suite in Optimum-habana-4.43.4 Auto pr 07654de #1387 @rkumar2patel
Enhance transformers test suite in Optimum-habana-4.43.4 (auto PR 8926a4b) #1386 @rkumar2patel
Add README.md for Sentence transformer examples with HPU device #1355 @ZhengHongming888
Change Falcon/GPT-Neox rotary embedding function to use seq_len for #1368 @yeonsily
Enhance Optimum-habana as per transformers-4.43.4 #1381 @rkumar2patel
CI fix - Install stable-diffusion reqs #1389 @vidyasiv
Fix error caused by uninitialized attn_weights #1391 @hsubramony
Replace flash attention flag #1393 @skaulintel
Fix DeepSpeed CI on Gaudi2 #1395 @regisss
Truncate the cached max seq len #1394 @astachowiczhabana
Fix gpt-neox training accuracy issue. #1397 @yeonsily
Simplify HQT config files #1219 @Tiefen-boop
unify_measurements.py script support to unify PCQ 70B 8x #1322 @Yantom1
Add misc. training args #1346 @SanityRemnants
Add quantization config for low bs case #1377 @ulivne
Remove HQT from OHF #1257 @Yantom1
Valid sequence length for sdpa #1183 @ssarkar2
Multiple fixes (dynamo graph break, qwen-moe, multicard) #1410 @ssarkar2
Change the image path for transformers tests back to the correct location #1401 @skaulintel
Fix Gaudi2 regression tests #1403 @regisss
Reverting some of transformer pytest funcs/values #1399 @imangohari1
Fix StarCoder2 inference #1405 @regisss
Change the order for test_diffusers #1406 @hsubramony
Fix llama model text generation error #1402 @zongwave
Datasets downgrade version to 2.21.0 #1413 @hsubramony
Update ci sentence_transformer.sh #1424 @ZhengHongming888
Update language-modeling README.md, add trust_remote_code for flan-t5-xl #1422 @hsubramony
Update unify_measurements.py support info #1425 @shepark
Fix GPT_neox incorrect output with batch query #1358 @Jianhong-Zhang
Fix text-to-image example #1429 @regisss
Add flag to run inference with partial dataset #1420 @pramodkumar-habanalabs
Add peft generation example #1427 @sywangyi
Added missing allocate_kv_cache() call in CausalLM class #1431 @yeonsily
Fix merge error and update text-to-speech readme #1436 @hsubramony
Fix OOM error for code llama #1437 @jiminha
Fix error on 4bit checkpoint load with run_lm_eval on TF4.45.2 #1439 @jiminha
GPT2 torch.compile fix #1434 @dsmertin
Update text-gen README.md to add auto-gptq fork install steps #1442 @hsubramony
Fix scoped linear all-reduce for starcoder model #1432 @skavulya
Fixed recursion error in SentenceTransformer #1428 @yafshar
Fix Llama 3.1 generation #1444 @regisss
Remove cache folder from image data folder #1446 @shepark

Contributors

harborn, yafshar, and 47 other contributors

Assets 2

06 Sep 20:17

regisss

v1.13.2

1266993

v1.13.2: Patch release

Llava(-next) improvements

This patch release adds multi-card support for Llava(-next) and enables users to turn on/off recomputing for flash attention.

Llava: Added flash_attention_recompute arg to provide an option to enable/disable recompute #1278 @tthakkal
Add the deepspeed injection_policy of mistral #1309 @yuanwu2017

Full Changelog: v1.13.1...v1.13.2

Contributors

yuanwu2017 and tthakkal

Assets 2

25 Aug 13:34

regisss

v1.13.1

52e22cb

v1.13.1: Patch release

Fixed memory regressions

Remove _expand_inputs_for_generation for greedy search (#1266) @libinta
Fix memory regression for modeling llama (#1271) @libinta

FSDP

FSDP checkpoint saving is fixed.

Fix BERT FSDP test (#1281) @regisss

Known limitations

ESMFold does not work on Gaudi1, this will be fixed in a future version

Full Changelog: v1.13.0...v1.13.1

Contributors

regisss and libinta

Assets 2

16 Aug 14:25

regisss

v1.13.0

41e0a3f

v1.13.0: Stable Diffusion 3, Sentence Transformers, SAM, DETR, Kubernetes example

SynapseAI 1.17

Upgrade SynapseAI version to 1.17.0 #1217

Transformers 4.43

Upgrade to Transformers 4.43 #1163 @regisss

Diffusers 0.29

Upgrade optimum-habana diffusers dependency from 0.26.3 to 0.29.2 #1150 @dsocek

Stable Diffusion 3

Sd3 #1153 @dsocek
Refactor SD3 #1199 @dsocek

Training with Sentence Transformers

Enable Sentence Transformer Trainer with Gaudi #1111 @ZhengHongming888

Model optimizations

Fix starcoder2 accuracy issue and optimize performance with fused rope #1095 @mandy-li
Enable FusedRoPE using float32 for gpt-neox model #1104 @yeonsily
Mamba initial enablement. #1122 @libinta
Adding fused qkv support along with config #1102 @bhargaveede
Enhance Qwen2 with fastsoftmax and bf16 RoPE and cache optimization #1087 @Zhiwei35
Enable fp8 inference for Llava-Next and add Fused_SDPA #1120 @tthakkal
Support bucket_internal for MPT #1137 @pk1d3v
Enable Flash Attention (Fused SDPA) for Starcoder #1114 @abhilash1910
gpt_bigcode: added FusedSDPA kernel #1138 @mgonchar
Enable torch.compile for Granite20B #1185 @dvarshney-habana
Refine use cache for mpt model #1158 @Jing1Ling
GPT-J support reuse_cache #1094 @atakaha
Use fast softmax only on prefill #1159 @jaygala223
Starcoder2 : KVCache and flash attention (FusedSDPA) enablement #1149 @abhatkal
Gpt bigcode fused sdpa #1260 @yeonsily

SAM, FastVIT, VideoMAE, OpenCLIP, DETR, Table Transformer, deciLM

Add an example of Segment Anything Model [Inference] #814 @cfgfung
Add an example of FastViT model (Infernece) #826 @cfgfung
VideoMAE Model Enabling and Examples #922 @pi314ever
OpenCLIP sample for visual question answering #977 @vidyasiv
Enabled DETR (Object Detection) model #1046 @cfgfung
Table transformer enabling #978 @pi314ever
deciLM support #1133 @sywangyi

Stable Diffusion inpainting, unconditional image generation

Add the Stable diffusion inpaint support #869 @yuanwu2017
Enable Unconditional Image Generation on Gaudi 2 [Diffuser/Tasks] #859 @cfgfung

Text feature extraction example

Feature extraction enabling #994 @pi314ever

Tensor parallelism

Tensor parallel distributed strategy without using deepspeed #1121 @kalyanjk
Disable torch.compile for all_reduce when parallel_strategy is set to "tp" #1174 @kalyanjk

Kubernetes cluster example

Adds a helm chart, dockerfile, and instructions for running examples using a Kubernetes cluster #1099 @dmsuehir
Fix PyTorch version in the Kubernetes docker-compose to match image #1246 @dmsuehir

FP8 training

TE FP8 integration #1096 @SanjuCSudhakaran

Other

Updates run_lora_clm.py with enhanced dataset support #955 @dmsuehir
Fix prefix tuning finetune issue and update test #975 @sywangyi
Fix throughput calculation in image-to-text example #1070 @regisss
SDXL-trainig: fixed ci, changed gated dataset, fixes for non-square datasets #1038 @imangohari1
Updating batch_size of Albert-XXL in README #1063 @vineethanandh
Fix the error of running run_pipeline.py of text_generation example #1055 @yuanwu2017
Add a test for llama finetuning with FP8 precision #1106 @SanjuCSudhakaran
Beam-search fix #1113 @ssarkar2
Add chat format support dataset in SFT #1066 @libinta
Fix nan loss of gemma and crash if dataset_concatenation is not set #1088 @sywangyi
torch.compile keep input mutation in graph this avoids unnecessary memcpy #1069 @sushildubey171
Updated langchain text-generation pipeline to work with latest release 0.2.5 #1084 @rbrugaro
Add the MC example #891 @yuanwu2017
Fix recompiles if limit_hpu_graph is False #1129 @ssarkar2
Update examples batchsize in README #1123 @shepark
Fix OOM error in SDXL Fine-Tuning validation stage #1134 @dsocek
Added an example code to demonstrate how to use deterministic image generation #878 @cfgfung
SD image variation/InstructPix2Pix/StableDiffusionXLImg2ImgPipeline pipeline #988 @sywangyi
Add ci test for trl rewarding and ppo, fix backward failure in ppo caused by rmsfusion #1020 @sywangyi
Llama adapter #983 @sywangyi
torch.flip issue is fixed in SynapseAI 1.16, so remove the WA #1092 @sywangyi
Fix test CausalLanguageModelingLORAExampleTester KeyError #1139 @dmsuehir
fix(ci): new runs-on #1136 @XciD
Add trust_remote_code for loading datasets in the audio classification example #1074 @regisss
Generation example: print number of warmup iterations #1145 @mgonchar
CI Updates: text-gen to recieve ranks/bs, Updated bs/metric for baselines #1140 @imangohari1
Support for custom files for run_lora_clm.py #1039 @vidyasiv
Change the device_id for FSDP plugin #1086 @ckvermaAI
Set KV Cache update as static method #1160 @ulivne
To fix CPU tensor issue #1157 @mkumargarg
Adding missing init.py to mistral and mixtral test package #1188 @rkumar2patel
Add example of multitask_prompt/poly tuning #915 @sywangyi
Fix data-type mismatch for mlperf_inference accuracy test #1146 @kalyanjk
Fix spawn MP context, limit cpu and download data #1131 @polisettyvarma
T5 multi card #1222 @yafshar
Add trust_remote_code for t5 poly-tuning test #1220 @yafshar
Resolve "empty tensor optional" error with hpu_graphs + kv cache for StarCoder #1181 @vidyasiv
Fix VIT, add wav2vec comment #1223 @ssarkar2
Roberta tests were running on CPU #1229 @ssarkar2
Fix bert/roberta contrastive search tests #1226 @skavulya
Remove the default env variable to trust remote code by default #1225 @yafshar
Improve style check workflow #1230 @regisss
Added scheduler selection for SDXL fine-tuning #867 @kplau1128
Clear help msg for ignore_eos to avoid misunderstanding @sywangyi
Support loading hugging face checkpoint #1165 @ulivne
Change triggering event for code style check #1238 @regisss
gptj: fix missing token_idx #1234 @envsp
fix(nltk): fixed the version to working one #1247 @imangohari1
Updating to avoid hardcoding tests in CI framework #1221 @vidyasiv
Fix FSDP graph error due to Tranformer 4.43 update #1251 @jiminha
Fix SD README commands #1250 @imangohari1
Fix spelling errors #1252 @changwangss
Set HLS_MODULE_ID only if it wasn't set previously #1254 @astachowiczhabana
Fix overflow of steps in SDXL for default diffusers scheduler @dsocek
fix(test_diffusers): automated the checking for tests without upstream HF #1232 @imangohari1
fix(nltk): Revert 1247. Updated the version. added the punkt_tab download #1258 @imangohari1
Set input_embeds before it gets used #1261 @tthakkal
Update README and more changes, rebase to main #1259 @shepark

Known limitations

For Llama, some big batch sizes lead to out-of-memory errors whereas they used to work

Contributors

kalyanjk, yafshar, and 42 other contributors

Assets 2

11 Jul 13:51

regisss

v1.12.1

820901c

v1.12.1: Patch Release

Fix 1st token latency time measure

Fix 1st token latency time #1091 @libinta

Fix for Mixtral

Mixtral typo fix #1107 @schoi-habana

Other

Fix for selective seq length test with batch size 1 #1110 @libinta

Full Changelog: v1.12.0...v1.12.1

Contributors

libinta and schoi-habana

Assets 2

22 Jun 18:28

regisss

v1.12.0

6adad16

v1.12: Qwen2, Gemma, SVD, Dreambooth, speculative sampling

SynapseAI v1.16

Upgrade to SynapseAI v1.16 #1043 @regisss

Transformers 4.40

Upgrade to Transformers 4.40 #1027 @regisss

Speculative Sampling

Speculative sampling on Gaudi using Optimum-Habana #973 @nraste
Fix assisted decoding generation error #1080 @libinta

Model optimizations

Add --bucket_size support for gpt_bigcode #802 @jiminha
Optimize StableLM model inference #805 @XinyuYe-Intel
Enable google/gemma-7b. #747 @lkk12014402
Enable llava static generation. #767 @lkk12014402
Fix perf drop in flan-t5 summarization #908 @MohitIntel
Enable Qwen2 model #774 @XinyuYe-Intel
Extend bucket_internal to SAMPLE generation mode #819 @xt574chen
SpeechT5 static consistent dropout #824 @Spycsh
Optimize inference of Persimmon model #822 @XinyuYe-Intel
Enable OWL-ViT graph mode on Gaudi platform #783 @cfgfung
Support mixtral kvcache reuse and remove kv_cache_fp8 #898 @jychen21
Add fp8 related changes to mistral for text-generation #918 @skaulintel
Optimization for phi series models: support fp8 kv cache and reuse kv cache #902 @yuwenzho
Support Mistral 32K input token #931 @jiminha
Support mixtral long sequence 32k with bs 4 #903 @jychen21
Adapt Mixtral long sequence handling for Mistral #985 @jiminha
Fix performance issue in mistral #1030 @jiminha
Optimized inference of Starcoder2 model #829 @XinyuYe-Intel
Add support for IBM Granite #1045 @regisss
Enable fp8 inference for Llava-hf 7B and 13B in 1.16 release #951 @Luca-Calabria
Fusedrope inp bf16 #1026 @ssarkar2
Enhance Qwen2 model with FSDPA and bucket #1033 @Zhiwei35
Optimize seamless-m4t/vits model for text-to-speech generation #825 @sywangyi
cache_optimization #1028 @ssarkar2
Ensure KV cache is not returned as output tensor during decode phase for Falcon #993 @schoi-habana
Fast softmax #972 @wszczurekhabana
Falcon optimization #974 @libinta
Quantization for FSDPA #976 @dudilester
Falcon update park #1052 @ssarkar2
Add the Llava_next support #1041 @yuanwu2017
Improve torch compile performance #1082 @libinta

Stable Video Diffusion

Add SVD pipeline #743 @dsocek

PEFT

Add ia3 and adalora support #809 @sywangyi
Enable prompt tuning/prefix tuning/p tuning clm and example #758 @sywangyi

TRL

Finetuning stable diffusion with DDPO #733 @skavulya

Object Segmentation Example

Add an example of object segmentation (ClipSeg) #801 @cfgfung

Dreambooth

Diffuser dreambooth full/lora/lokr/loha/oft finetune, dreambooth XL lora finetune #881 @sywangyi

Others

Text generation pipeline: Extended functionality to align with run_generation script #782 @mgonchar
Enable clip mediapipe and update G2 baseline #856 @MohitIntel
Add ci test for SFT and DPO #857 @sywangyi
Fix SFT, DPO CI on Gaudi1 #893 @regisss
Add SDXL in README #894 @regisss
Fix falcon 180b oom issue if peft > 0.6.2 #895 @sywangyi
Enabled additional models in CI #879 @MohitIntel
Add static shape support for vision_encoder_decoder generation if decoder supports static shape #834 @sywangyi
Add HabanaProfile to Stable Diffusion and XL #828 @atakaha
Pytest accuracy updates for Falcon, T5, GPT2 #916 @Luca-Calabria
Update text-generation readme with torch.compile info. #884 @libinta
Update Wav2Vec2ModelTest::test_initialization #919 @malkomes
Add linear and dynamic RoPE to Mistral and Mixtral #892 @regisss
Fix for wav2vec2 test cases #923 @lqnguyen
Add nograd() to prevent backward backend #897 @astachowiczhabana
Assisted decoding not implemented #910 @tjs-intel
Disable wav2vec2 symbolic tracing test #904 @tjs-intel
Add support for symbolic tracing of GPT2 models #913 @tjs-intel
Utils: return more reasonable error in case of attempt of non-PyTorch model loading #921 @mgonchar
Pytest accuracy updates for Bridgetower, Swin, Vit #927 @Luca-Calabria
Text generation: added langchain pipeline script #887 @mgonchar
Fix for AST models #914 @vidyasiv
Fix AttributeError for wav2vec test #929 @Jianhong-Zhang
Fix ValueError for test_summarization #939 @Jianhong-Zhang
Grad norm tensor fix #938 @yeonsily
Add information to the audio-classification examples README about --ddp_find_unused_parameters parameter #941 @Alberto-Villarreal
Add leaderboard link #947 @echarlaix
Fix formatting of arg parse help strings in the PEFT example #944 @dmsuehir
Use new Habana llama and falcon model configs #940 @skaulintel
Update based on legal requirements. #900 @libinta
Update test generation config to raise ValueError #949 @malkomes
Add --trust_remote_code for text generation examples #870 @yangulei
Added Llama-2 fp8 text-generation test cases #934 @yeonsily
Upgrade SD output image verification with CLIP score #920 @MohitIntel
Llama Guard for text classification example #871 @dsmertin
Update README logo #950 @regisss
Add Gaudi CI for Sentence Transformers #928 @regisss
Get iteration times through generate() #899 @hsubramony
Update speech recognition seq2seq example #953 @regisss
Fix wrongly all_gather for mixtral finetune #965 @ccrhx4
Add intel-mila protST example #860 @sywangyi
Small CI refacto #968 @regisss
Llama70b one card to infer device map with max memory limitation #963 @Yantom1
Map list to tensors #926 @ssarkar2
Fix fsdp lora torch compile issue #971 @sywangyi
Fix for the simulate_dyn_prompt flag assertion #984 @alekseyfa
Initial enablement with FP8 Training (port from OHF #91) #936 @libinta
Warn user when using --disk_offload without hqt #964 @Yantom1
Assign grad_norm for logging only if it's a single element tensor #992 @yeonsily
Update examples #998 @regisss
Fix warmup for diffusers when batch size < throughput_warmup_steps #960 @dsocek
Add torch.compile instructions for Roberta-Large #981 @MohitIntel
Fix gpt_neox, stablelm inference regression caused by RoPE dtype #999 @mandy-li
fea(examples): Updated the READMEs with requirements.txt installation #1000 @imangohari1
Initial commit for fp8 CI #995 @yeonsily
Fixed 'MixtralConfig' object has no attribute 'rope_scaling' #1009 @aslanxie
Use the lenght of timesteps as the inference step num #986 @yuanwu2017
Fix the bug of output_type=np or latent. #996 @yuanwu2017
Fix wav2vec test load adapter #937 @malkomes
Mark scale as const and remove --fp8 flag usage #962 @Yantom1
Add per step time collection to other methods #1004 @ssarkar2
Fix first token time #1019 @ssarkar2
Fix text-generation example #1025 @regisss
Updates test_beam_search to transformers_4.40 #1017 @malkomes
Fix eos problem #1034 @sywangyi
fp8 textgen ci structure update #1029 @jiminha
Fix a return value issue casued by PR 973 #1040 @yafshar
Add no_checks for sub dataset in lvwerra/stack-exchange-paired since it does not contain test split #1003 @sywangyi
Readme Update for FSDP #980 @hlahkar
Add unifier script and disk offload flag usages to README. #1023 @libinta
Add mixtral for meta device load due to mixtral-8x22b model size #909 @libinta
Update unifier script #1010 @Yantom1
Update text-generation CI configuration for falcon and Mixtral #1044 @yeonsily
Update multi-node README to check ssh connection issue #1048 @yeonsily
Infra upgrade workflows #480 @glegendre01
Update test_text_generation_example.py #1051 @ssarkar2
BERT training migrated to torch.compile #990 @ANSHUMAN87
Update test_examples.py #1053 @ssarkar2
Update modeling_llama.py: deepspeed fix for codellama #1054 @ssarkar2
No shapes in profilings by default #1050 @astachowiczhabana
Change the way to unset environemt variable for gpt-neox ci #1060 @yeonsily
Update README for Albert torch.compile mode #1061 @MohitIntel
Fix lm_evaluation_harness to specific commit (#240) #1064 @astachowiczhabana
Fix text-generation example README.md #1081 @shepark

Contributors

yafshar, skavulya, and 47 other contributors

Assets 2

20 Apr 05:28

regisss

v1.11.1

989484c

v1.11.1: Patch Release

Llama3 has been validated on Gaudi

Llama3 test and readme changes #905 @ssarkar2

Fix issue with `pytest`

The latest SynapseAI Docker images come with Pytest v8 already installed, which is incompatible with the Transformers library and leads to an error in a few non-test cases. As a temporary workaround, Pytest is pinned and moved as a hard dependency.

Move pytest dependency #883 @regisss

Other

Fp8 merge fix #863 @libinta
Fixed "reuse_cache" Bug #888 @Danielohayon
Remove deprecated AOT_HPU_TRAINING_BACKEND #877 @astachowiczhabana
Add mark step and inplace residual add in llama model code #833 @puneeshkhanna
Enable Flash Attention in recompute and causal modes #862 @wszczurekhabana
Add mark_step for llama inference #875 @libinta

Full Changelog: v1.11.0...v1.11.1

Contributors

regisss, ssarkar2, and 5 other contributors

Assets 2

04 Apr 14:55

regisss

v1.11.0

eaac913

v1.11: SDXL fine-tuning, Whisper, Phi, ControlNet

SynapseAI v1.15

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.15.0.

Upgrade to SynapseAI 1.15.0 #831 @regisss

SDXL fine-tuning

SDXL fine tuning #667 @dsocek
Mediapipe sdxl #787 @ssarkar2

Whisper

Support speech recognition with whisper models and seq2seq #704 @emascarenhas

Phi

Enable phi series models #732 @lkk12014402

ControlNet

Controlnet training #650 @vidyasiv

Transformers v4.38

The codebase is fully validated for Transformers v4.38.

Upgrade to Transformers 4.38 #788 @regisss

Model optimizations

Add optimization for blip text model generation #653 @sywangyi
Enable internal kv bucket in llama #720 @xt574chen
Enable Mixtral-8x7B #739 @jychen-habana
Update Mixtral-8x7B fp8 hqt example #756 @jychen-habana
Further fixes for performance with internal bucketing #781 @puneeshkhanna
speecht5 optimization #722 @sywangyi
move img_mask@get_attn_mask() to hpu #795 @hsubramony
Mistral optimizations #804 @ssarkar2

Image-to-text and VQA examples

Add image-to-text and visual question answering example #738 @sywangyi

torch.compile

Enable torch_compile mode for distributed #659 @kalyanjk
Fix graph breaks in torch compile mode #806 @hlahkar
Fix torch.compile for text generation #811 @regisss
Add Llama7b FSDP test for torch.compile mode #818 @pankd

Bug fixes

Fix beamsearch crash and incorrect output in decode-only model and encode-decode model #627 @sywangyi
Fix translation models #710 @vidyasiv
Fix throughput calculation for diffusion models #715 @skavulya
Fix crash in llama mode in llava image-to-text generation #755 @sywangyi
Fix backward error in DDP when running reward model finetune in RLHF #507 @sywangyi
Fix get_dtype and convert_into_dtypes #769 @regisss
Override sdpa option in Gaudi #771 @jiminha
Fix Llama-70B-FSDP model loading issue #752 @hlahkar
Fix FSDP in transformer4.38 #812 @libinta
Delay importing deepspeed comm due for perf #810 @jiminha
Fix llama rotary pos emb issue for transformers 4.38 #813 @libinta
Fix torch.full issue below when running deepspeed z3 for llama #820 @libinta
Fix profile issue with 1st step #837 @libinta
Fix mistral after syn1.15 update #858 @ssarkar2

Others

Small test_text_generation_example.py refacto #725 @regisss
Update README, add PPO support #721 @sywangyi
Update the Mistral model naming #726 @yafshar
Changing backend name #708 @vivekgoe
Update ppo_trainer.py #718 @skaulintel
Add seed in sft example, make sft result reproducable #735 @sywangyi
Adding a flag whether to save checkpoint or not in run_lora_clm.py #736 @yeonsily
Refactor and update CI for encoder-decoders #742 @regisss
Expose Llama Fused OPs control from run_lora_clm.py #751 @hlahkar
Fixing tests by making static_shapes False #778 @bhargaveede
Fix ControlNet README #785 @regisss
Workaround for RoPE computed in bf16 for GPT-NeoX #746 @regisss
Add Whisper and SpeechT5 to model table #790 @regisss
Update summarization example README #791 @srajabos
Block torchscript pytest because of seg fault issue #793 @yeonsily
Fix test_encoder_decoder.py for opus-mt-zh-en #798 @regisss
Replacing obsolete API for mediapipe #796 @MohitIntel
Add --distribution_strategy fast_ddp in contrastive-image-text README and BridgeTower test #799 @regisss
Fix redundant bucket internal and hpu graph setting #797 @puneeshkhanna
Add Llama test for fsdp #761 @hlahkar
Enable dynamic shapes for esmfold #803 @hsubramony
Add Llama/Llama2 support in Question-Answering #745 @kplau1128
Update MLM example #830 @regisss
Revert Wav2Vec2 TDNNLayer forward function same as transformer v4.37.2 #827 @yeonsily
Save CI test output image #835 @MohitIntel
Update ckpt loading #773 @schoi-habana
Skip SDXL test in CI #840 @regisss
Fix FSDP test on Gaudi1 #841 @regisss
Remove installation from source for Diffusers in CI #846 @regisss
Fix fp8 ci #852 @regisss
Fix PR #848 #853 @regisss
Disable safe loading tests in CI #854 @regisss
Add warmup for eval #855 @libinta

Known issue

A crash may occur with unify_measurements.py

Contributors

kalyanjk, yafshar, and 24 other contributors

Assets 2

23 Feb 03:26

regisss

v1.10.4

1dfbc02

v1.10.4: Patch release

Fix Llama memory issue with DeepSpeed ZeRO-3

Fix Llama initialization #712

Full Changelog: v1.10.2...v1.10.4

Assets 2

Releases: huggingface/optimum-habana

v1.14.1: Patch release

Contributors

v1.14.0: Transformers v4.45, SynapseAI v1.18, Qwen2-MoE, text-to-video generation

Transformers v4.45

SynapseAI v1.18

Qwen2-MoE

Text-to-video generation

Depth-to-image generation

Model optimizations

Intel Neural Compressor

Vera/LN-tuning

Other

Contributors

v1.13.2: Patch release

Llava(-next) improvements

Contributors

v1.13.1: Patch release

Fixed memory regressions

FSDP

Known limitations

Contributors

v1.13.0: Stable Diffusion 3, Sentence Transformers, SAM, DETR, Kubernetes example

SynapseAI 1.17

Transformers 4.43

Diffusers 0.29

Stable Diffusion 3

Training with Sentence Transformers

Model optimizations

SAM, FastVIT, VideoMAE, OpenCLIP, DETR, Table Transformer, deciLM

Stable Diffusion inpainting, unconditional image generation

Text feature extraction example

Tensor parallelism

Kubernetes cluster example

FP8 training

Other

Known limitations

Contributors

v1.12.1: Patch Release

Fix 1st token latency time measure

Fix for Mixtral

Other

Contributors

v1.12: Qwen2, Gemma, SVD, Dreambooth, speculative sampling

SynapseAI v1.16

Transformers 4.40

Speculative Sampling

Model optimizations

Stable Video Diffusion

PEFT

TRL

Object Segmentation Example

Dreambooth

Others

Contributors

v1.11.1: Patch Release

Llama3 has been validated on Gaudi

Fix issue with pytest

Other

Contributors

v1.11: SDXL fine-tuning, Whisper, Phi, ControlNet

SynapseAI v1.15

SDXL fine-tuning

Whisper

Phi

ControlNet

Transformers v4.38

Model optimizations

Image-to-text and VQA examples

torch.compile

Bug fixes

Others

Known issue

Contributors

v1.10.4: Patch release

Fix Llama memory issue with DeepSpeed ZeRO-3

Fix issue with `pytest`