Release v0.0.5: NeuronModel classes and generation methods during training · huggingface/optimum-neuron

NeuronModel classes

NeuronModel classes allow you to run inference on Inf1 and Inf2 instances while preserving the python interface you are used to from Transformers' auto model classses.

Example:

from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained(
    "optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx"
)
model = NeuronModelForSequenceClassification.from_pretrained(
    "optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx"
)

inputs = tokenizer("Hamilton is considered to be the best musical of human history.", return_tensors="pt")

outputs = model(**inputs)

Supported tasks are:

Feature extraction
Masked language modeling
Text classification
Token classification
Question answering
Multiple choice

Relevant PR: #45

Generation methods

Two generation methods are now supported:

Greedy decoding (#70)
Beam search (#93)

This allows you to perform evaluation with generation during decoder and seq2seq models training.

Misc

The Optimum CLI now provides two new commands to help managing the cache:

optimum-cli neuron cache list: To list a remote cache repo on the Hugging Face Hub (#85)
optimum-cli neuron cache add: To add compilation files related to a model to a remote cache repo on the Hugging Face Hub (#51)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.5: NeuronModel classes and generation methods during training

NeuronModel classes

Generation methods

Misc