Releases: huggingface/optimum-neuron
v0.0.6: Patch release
v0.0.5: NeuronModel classes and generation methods during training
NeuronModel classes
NeuronModel classes allow you to run inference on Inf1
and Inf2
instances while preserving the python interface you are used to from Transformers' auto model classses.
Example:
from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained(
"optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx"
)
model = NeuronModelForSequenceClassification.from_pretrained(
"optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx"
)
inputs = tokenizer("Hamilton is considered to be the best musical of human history.", return_tensors="pt")
outputs = model(**inputs)
Supported tasks are:
- Feature extraction
- Masked language modeling
- Text classification
- Token classification
- Question answering
- Multiple choice
Relevant PR: #45
Generation methods
Two generation methods are now supported:
This allows you to perform evaluation with generation during decoder and seq2seq models training.
Misc
The Optimum CLI now provides two new commands to help managing the cache:
v0.0.4: Patch release for Neuron installation
optimum-cli neuron cache
command line
The optimum-cli
now provides two commands to work with the Trainium cache:
- Cache creation:
optimum-cli neuron cache create
- Cache setting:
optimum-cli neuron set
Documentation
- New Trainium model cache documentation page
v0.0.3: Patch release for the `huggingface_hub` library version
Pins the version of the huggingface_hub
library to be greater or equal to 0.14.0
.
Should fix errors related to #41.
v0.0.2: Compilation caching system and inference with Inferentia
Compilation caching system
Since compiling models before being able to train them can be a real bottleneck (for example on small datasets, compile-time is longer than training-time), we introduce a caching system directly connected to the Hugging Face Hub.
Before starting compilation, the TrainiumTrainer
checks if the needed compile files are on the Hub, and fetched them if that is the case, saving the user the need to do that himself.
Custom cache repo
Since each user might want to have its own cache repo to be able to push stuff and/or keep things private, we offer the possibility to do so via CUSTOM_CACHE_REPO environment variable:
CUSTOM_CACHE_REPO=michaelbenayoun/cache_test python train.py
Neuron export
Support exporting PyTorch models to serialized TorchScript Module compiled by Neuron Compiler (neuron-cc
or neuronx-cc
) that can be used on AWS INF2 or INF1.
Example: Export the BERT model with static shapes:
optimum-cli export neuron --help
optimum-cli export neuron --model bert-base-uncased --sequence_length 128 --batch_size 16 bert_neuron/
By default, on INF2, matmul
operations will be cast from fp32
to bf16
. And on INF1, all operations will be cast to bf16
. Using --auto_cast
to configure which operations to perform auto-casting and using --auto_cast_type
to define the data type for auto-casting.
Example: Auto-cast all operations (this option can potentially lower precision/accuracy) to fp16
data type:
optimum-cli export neuron --model bert-base-uncased --auto_cast all --auto_cast_type fp16 bert_neuron/
v0.0.1: Training on AWS Trainium
The following architectures can be trained on AWS Trainium instances (trn1.2xlarge and trn1.32xlarge) :
- ALBERT
- BERT
- DistilBERT
- RoBERTa
- XLM-RoBERTa
- CamemBERT
- Electra
- GPT-2
- GPT-Neo
- MarianMT
- T5
- BART
- ViT
Training examples for many tasks are provided here.